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Preface 


This  book  is  a  collection  of  research  papers  on  a  wide  variety  of  multigrid  topics,  including 
applications,  computation  and  theory.  These  papers  stem  from  the  Third  Copper  Mountain 
Conference  on  Multigrid  Methods,  which  was  held  at  Copper  Mountain,  Colorado,  April  5-10, 
1987.  As  such,  this  book  represents  proceedings  of  that  conference.', However,  each  paper  has 
been  subjected  to  the  usual  journal  refereeing  process  and,  in  the  opinion  of  the  editors,  each 
represents  a  significant  contribution  to  the  multigrid  field. 

Some  of  the  organizers  of  the  Copper  Mountain  Conference  acted, 4s  editors  in  this  refereeing 
process:  Joel  Dendy  (Los  Alamos  National  Laboratory),  Jan  Alan  del  (The  University  of 
Colorado  at  Denver),  Seymour  Parter  (University  of  Wisconsin— Madison),  and  John  Ruge 
(University  of  Colorado  at  Denver).  We  are  grateful  to  these  editors  for  their  professional  and 
timely  assistance.  We  are  equally  grateful  to  the  referees. 

The  multigrid  discipline  has  been  steadily  maturing,  as  is  evident  from  the  literature.  The 
vibrancy  in  this  field  is  felt  not  only  in  the  growth' of  this  literature,  but  in  the  dramatic 
increase  in  its  breadth  and  depth  as  well.  These  proceedings  provide  an  almost  overwhelming 
example  of  this  trend. 
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S.  F.  McCormick 
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2D  Full  Potential  Flow  Solver 
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INTRODUCTION 

The  full  potential  equation  is  given  in  a  body-fitted  coordinate  system  which  consists  of  the 
streamlines  and  the  equipotential  lines  of  the  corresponding  incompressible  flow.  Due  to  the 
use  of  a  special  shock  operator,  the  discrete  solution  satisfies  Prandtl’s  shock  condition  and 
therefore  is  more  physical  than  the  solutions  of  so-called  fully  conservative  or  non¬ 
conservative  schemes. 

The  full  potential  flow  model  is  discretized  on  an  unequally  spaced  mesh.  In  the 
context  of  MG  methods,  this  requires  speciat  smoothing  algorithms  consisting  of  a  combination 
of  local,  row  and  column  relaxations. 

For  subsonic  flows,  the  convergence  factors  for  one  (2,J)-W-cycle  of  the  algorithm  are 
smaller  than  0.1  as  it  can  be  expected  by  the  local  mode  analysis. 

Due  to  the  change  in  the  discretization  across  shocks,  which  is  necessarily 
discontinuous  with  the  use  of  shock  operators,  monotonic  and  fast  convergence  cannot  be 
guaranteed  for  transonic  flows  when  using  standard  coarsening  in  the  FAS  scheme.  But  if 
the  grid  is  not  coarsened  in  the  flow  direction  around  shocks,  nearly  the  same  speed  of 
convergence  can  be  achieved  as  for  subsonic  flows. 

A  variety  of  numerical  examples  is  given  to  show  the  effectiveness  of  the  new  adaptive 
coarse  grid  strategy  which  allows  one  to  exploit  the  full  power  of  the  multigrid  idea. 


1.  INTRODUCTION 

Since  the  beginning  of  the  exploration  of  MG-meihods  full  potential  MG-solvers  have  suffered 
either  from  a  slow  convergence  or  from  a  non-reliable  convergence.  But  most  people  were 
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content  with  a  poor,  but  remarkable  improvement  of  the  speed  of  convergence  compared  to 
that  of  simple  relaxation  methods  of  any  kind.  The  most  challenging  potential  of  MG-methods, 
however,  has  seldom  been  exploited  to  its  end:  If  the  smoothing  factor  is  about  0.5  -  which 
it  indeed  can  be  for  full  potential  problems  -  one  should  achieve  a  convergence  factor  around 
0.1  per  (2,l)-W-cycle. 

It  has  to  be  emphasized  that  within  the  field  of  application  of  our  code  the  MG- 
algorithm  has  to  be  much  faster  than  the  column  relaxation  which  takes  about  800  to  1000 
sweeps  for  convergence.  Otherwise  the  time  that  is  necessary  for  the  development, 
implementation  and  integration  of  the  MG-code  would  not  be  worthwhile. 

For  subsonic  flows,  this  aim  can  clearly  be  reached.  The  convergence  factors  are  less 
than  0.1  independent  of  the  special  airfoil  and  the  free  stream  Mach  number.  The 
computational  effort  for  one  cycle  is  about  20  times  as  large  as  for  one  sweep  of  a  column 
relaxation  on  the  finest  grid.  A  sufficiently  accurate  solution  can  be  achieved  within  3 
cycles. 

For  transonic  flows,  the  discrete  solution  is  very  sensitive  to  small  perturbations. 

Therefore  the  bad  approximation  quality  of  the  standard  coarse  grid  representation  near 
shocks  and  the  errors  introduced  by  the  interpolation  of  corrections  across  shocks  leads  to  a 
non-monotonic  and  very  slow  convergence  in  many  flow  situations.  This  can  be  avoided  by  a 
new  adaptive  coarse  grid  strategy:  Around  shocks  the  grid  is  not  coarsened  in  the  flow 
direction.  This  method  has  turned  out  to  be  very  robust  and  gives  convergence  factors  around 
0.1  for  any  flow  problem  -  independent  of  the  shock  strength  or  other  parameters.  The 
additional  computational  effort  is  at  most  10  per  cent  of  the  work  of  a  standard  algorithm. 


2.  PROBLEM  FORMULATION 
2.1  Basic  Flow  Equations 

Inviscid  and  irrotational  flow  around  an  airfoil  can  be  described  by  the  full  potential 
equation  which  in  cartesian  (x,y)-coordinates  reads 


(c1  -  u*)  <J>XX  -  2uv<t>xv  ♦  (c1  -v*)#YV  =  0. 


(1) 


<t>  is  the  potential  of  the  velocity  field;  u  :«  <PX  and  v  :=  <py  are  the  streamwise  and  the 
normal  component  of  the  velocity  vector,  respectively,  c  is  the  speed  of  sound  which  is 
determined  by  Bernoulli’s  law 


*  *  «v  <**»->**  '  »  - 
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(2) 


M«is  the  free  stream  Mach  number  and  Y  is  the  ratio  of  specific  heats  (  Y  =  1.4  for  air). 

The  most  promising  discrete  representation  of  the  problem  seems  to  be  to  use  a  body 
conforming  grid.  As  we  want  to  exploit  some  special  information  about  the  flow  and  to  be 
open  to  an  effective  and  flexible  3D  approach,  we  have  chosen  the  grid  to  consist  of  the 
streamlines  and  the  equipotential  lines  of  the  incompressible  flow  called  i-streamlines  and 
i-equipotential  lines  in  the  following.  The  angle  of  attack  of  the  undisturbed  flow  is  taken 
into  account  within  the  calculation  of  the  grid  by  a  panel  method  [8].  So  the  grid  lines 
nearly  represent  the  streamlines  of  the  compressible  flow  and  the  orthogonals  to  them. 
Additionally,  this  gives  a  good  initial  guess  for  the  final  solution.  A  typical  example  of  a 
grid  is  shown  in  Figure  1. 

Within  this  (^,  ^-coordinate  system  the  full  potential  equation  reads 

(c2  -  u2)  -  2uv<t>„  +  (c2  -  v2) 


FIG.  1:  Body-fitted  coordinate  system,  RAE2822 
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---(<♦<)  (ft  +  M 


=  0 


where 


-U. 


(4) 


U  =  /f"$t  ,  v  :  /[♦,  . 


f  ;■  +  tp*  m  +<i>*  is  the  Jacobi-determinant  of  the  coordinate  transformation. 

Its  physical  meaning  is  the  squared  velocity  of  the  incompressible  flow. 


2.2  Boundary  Conditions 

Along  the  surface  of  the  airfoil  the  boundary  condition  is 


s  V  (5) 

The  values  on  the  right  hand  side  are  expected  to  be  given  by  a  boundary  layer  method 
during  an  outer  iteration  process.  For  inviscid  flows  vjj  is  zero. 

The  i-streamline  leaving  the  trailing  edge  of  the  airfoil  is  used  for  wake  simulation. 

Jumps  of  the  normal  and  the  tangential  velocity  component  across  this  cut  in  the  flow  field 
have  to  be  allowed  for  in  connection  with  boundary  layer  calculations.  Even  if  these  jumps 
are  zero  the  potential  itself  can  exhibit  a  jump  across  the  wake  line. 


VTA<1>*  =  A  vn 


(6) 


/T  A<t>f  =  A  v, 


(7) 


4  indicates  the  jump  across  the  wake.  For  inviscid  flow  we  have  <1V0  =  AV,  -  0.  In  this  case 
the  jump  of  the  potential  <t>  is  constant  along  the  wake  line,  and  is  equal  to  the  circulation 
T  of  the  flow. 
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The  circulation  £  and  the  potential  <P  of  the  incompressible  flow  are  evaluated  from  the  grid 
system.  6  is  the  angle  measured  from  the  straight  connection  between  the  origin  inside  the 
airfoil  and  the  far  field  end  of  the  wake  line  (see  Figure  2). 


2.3  Shock  Operator 

The  flow  model  given  above  is  strictly  correct  for  inviscid  and  irrotational  flow  only.  Dut 
it  should  also  be  used  for  transonic  flow  calculations  where  the  condition  of  irrotationality 
is  violated  if  shocks  are  present.  The  pure  application  of  so-called  conservative  or 
non -conservative  discretization  schemes  across  shocks  gives  neither  the  right  shock  position 
nor  the  right  strength  [10].  So  we  decided,  like  e.g.  Boerstoel  [4]  or  Murman  [9],  to  use  a 
special  shock  point  operator  which  guarantees  more  physical  behaviour  of  the  discrete 
solution  across  shocks  [8], 

The  shock  operator  used  here  is  an  extension  of  the  shock  operator  of  Murman  [9], 
originally  developed  by  Klevenhusen  [7].  The  goal  is  to  achieve  a  good  agreement  with  the 
shock  jump  condition  of  Prandtl  [8]  which  directly  follows  from  the  Rankine-Hugoniot  shock 
jump  conditions.  In  order  to  allow  for  a  study  of  different  discretizations,  it  is 
convenient  to  express  this  operator  in  the  form  of  a  differential  equation. 

For  normal  shocks  Prandtl's  equation  reads 
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ci  -  u,  •  U,  =  0  . 


(9) 


c»  is  the  critical  speed  of  sound  and  u,  and  are  the  velocity  components  normal  to  the 
shock  immediately  in  front  of  the  shock  and  behind  it,  respectively. 

The  shock  operator  is  constructed  in  the  following  way;  Usually  shocks  appear  in 
regions  where  the  ^-derivatives  of  the  potential  are  much  smaller  than  the  v>  -derivatives. 
If  we  neglect  the  ^-derivatives,  the  full  potential  equation  reduces  to  the  one-dimensional 
equation 


(c1  -  f  -  -J-  ft  «  0. 


(10) 


Term  (10)  of  the  original  differential  operator  in  (3)  is  replaced  by  the  correctly  scaled 
Prandtl  operator,  i.e.  (9)  multiplied  by  (l’+l)/2  and  #ff: 


V*  1 


2  ^  c«* "  ur  u  J  ^tt  • 


(ii) 


The  result  is  the  differential  shock  operator 


y  +  1 

— f  c«  '  ui  •  u, )  -  2uv  $ft  ♦  (c!  -  v2)  <t># 


(12) 


y  f*  \  ft,  ♦  f,  <M. 


The  same  procedure  holds  for  oblique  shocks.  According  to  [8]  the  only  thing  is  to  replace 
the  critical  speed  of  sound  in  (9)  by  the  reduced  critical  speed  of  sound 


C*  * 


cj  -  fy-D/(y*D  ui-cos’e 
sin  0  sinlS-d) 


(13) 
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FIG.  3:  Shock  model  with  shock  angle  0  and  deflection  angle  •& 


where  0  is  the  shock  angle  and  d  is  the  deflection  angle  (see  Figure  3).  Both  angles  can 
be  calculated  from  the  in  viscid  flow  according  to  the  equations  given  in  [8]. 


3.  DISCRETIZATION 

The  choice  of  the  grid  gives  rise  to  the  following  discretization  of  the  boundary  value 
problem  described  in  section  2. 


3.1  Differential  Operator 

In  subsonic  flow  regions  equation  (3)  is  elliptic.  All  derivatives  are  discretized  by  quasi¬ 
central  differences  on  a  non-equidistant  grid  as  depicted  in  Figure  1.  This  yields  a  9-point 
difference  molecule. 

In  supersonic  flow  regions  equation  (3)  is  hyperbolic,  and  consequently  all  derivatives 
in  i-$treamline  direction  are  discretized  by  backward  differences  -  including  the  velocity 
component  u.  There  is  no  need  for  a  completely  rotated  difference  scheme  like  described 
in  [5]  because  the  grid  is  nearly  aligned  with  the  flow 

There  are  two  transition  points  on  each  i-streamline  when  passing  the  supersonic  region: 
The  point  when  entering  the  supersonic  zone  is  called  sonic  or  parabolic  point  and  the  point 
when  leaving  it  is  called  shock  point. 
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A  parabolic  point  is  defined  as  a  first  supersonic  point  on  an  i-streamline.  Equation  (3) 
has  a  hyperbolic  character  there  and  the  second  derivatives  in  i-streamline  direction  have  to 
be  discretized  by  backward  differences.  But  the  u  velocity  component  has  to  remain  centrally 
differenced  to  avoid  a  change  of  type. 

At  a  shock  point  which  is  a  first  subsonic  point  on  an  i-streamline  after  some  super¬ 
sonic  points  the  shock  operator  applies.  It  depends  on  the  discretization  of  the  velocity 
values  u,  and  u2  if  the  type  is  elliptic  or  hyperbolic.  The  discretization  of  the  second 
derivatives  of  (12)  depends  on  this  decision.  For  the  discretization  of  u,  and  u2  we  have 
chosen  the  model  depicted  in  Figure  4.  It  has  turned  out  to  be  stable  and  it  gives  good 
results  concerning  the  shock  position  and  strength  [8], 

In  order  to  have  a  good  representation  of  shock-free  flows  too,  the  discrete  shock 
operator  has  to  be  continuously  shifted  into  the  subsonic  operator  -  depending  on  the  shock 
strength.  The  Mach  number  immediately  upstream  of  the  shock  can  be  taken  as  an  indicator 
for  the  shock  strength  on  each  streamline  separately. 

The  grid  singularity  at  the  leading  edge  of  the  airfoil  requires  a  special  discretization 
of  equation  (3).  The  singularity  can  be  overcome  by  using  the  divergence  form  of  the  full 
potential  equation, 


div  I  p  ■  v  )  s  0  . 


(14) 


shock  location 


($;  potential  normalized  so  that  <  0  *  M  Z  1 
'•  <X>x  backward  differenced  at  Xj 
^2  •  4s*  forward  differenced  at  X; 

FIG.  4:  Discretization  of  the  normal  velocity  components  of  the  shock  operator 


Becker 


9 


The  application  of  the  Gaussian  theorem  to  a  control  volume  V  as  depicted  in  Figure  5  leads 
to  the  equation 


/p  (v-n)  ds  =  0  (15) 

dv 

where  v.n  is  the  outward  directed  normal  velocity  component. 

As  the  free  stream  Mach  number  tends  to  zero  the  compressibility  can  be  neglected.  In 
this  case  the  potential  <p  of  the  incompressible  flow  -  given  by  the  grid  -  has  to  be  a  very 
good  approximation  to  the  solution  of  the  compressible  flow  equations.  To  guarantee  this 
consistency  of  the  discrete  flow  representations  (15)  has  to  be  discretized  using  both  the 
body-fitted  and  the  cartesian  coordinate  system  in  common.  The  integral  part  from  point 
no.  1  to  point  no.  9  (see  Figure  5)  is  evaluated  by  the  trapezoidal  rule  applied  in  the 
body-fitted  coordinate  system.  This  is  no  longer  possible  for  the  integral  part  from  point 
no.  9  to  point  no.  1  along  the  surface  of  the  airfoil.  There  the  trapezoidal  rule  is  applied 
in  the  cartesian  coordinate  system. 

The  grid  singularity  at  the  trailing  edge  is  treated  in  a  much  simpler  way:  The  trailing 
edge  is  never  allowed  to  be  a  grid  point  and  all  discretizations  are  done  across  this  point 
using  the  body-fitted  system  without  any  modification. 


FIG.  5:  Control  volume  for  discretization  at  the  leading  edge 
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3.2  Boundary  and  Cut  Conditions 

The  boundary  conditions  on  the  profile  are  discretized  by  one-sided  difference  formulas  of 
second  order.  Along  the  cut  there  is  a  grid  overlap  of  three  points  and  the  normal 
derivatives  are  expressed  by  central  differences.  The  tangential  derivatives  are  given  by 
central  difference  formulas,  but  they  are  meant  for  the  velocity  at  the  point  in  the  middle 
between  two  grid  points.  Central  differencing  within  the  grid  would  give  rise  to 
oscillations  of  the  local  circulation  along  the  wake  and  would  be  difficult  to  use  with  the 
relaxation  algorithms  described  below. 

In  inviscid  calculations  the  discrete  equations  guarantee  a  transport  of  the  circulation 
from  the  trailing  edge  to  the  far  field  boundary.  To  have  the  right  value  of  circulation  in 
viscid  calculations  too,  the  circulation  has  to  be  evaluated  from  the  rightmost  jump  equation 
on  the  wake. 


4.  MULTIGRID  ANALYSIS 

The  first  multigrid  approach  used  for  our  problem  is  a  standard  one.  The  sequence  of  coarser 
grids  is  constructed  from  a  finest  grid  by  doubling  the  mesh  sizes  in  each  direction.  The 
cycling  is  done  by  (2,l)-W-cycles,  and  the  FMG-FAS-Mode  (full  multigrid,  full  approximation 
scheme)  is  used.  The  single  components  of  the  algorithm  are  as  follows. 


4.1  Smoothing 

Smoothing  of  the  residuals  is  done  by  block  relaxation  algorithms.  If  we  first  assume  the 
grid  to  be  nearly  equidistant  with  a  grid  size  ratio  about  1,  the  most  favourable  algorithm 
is  a  Gauss-Seidel  column  relaxation  marching  in  the  downstream  direction  [1].  In  order  to 
have  a  good  smoothing  across  the  coordinate  cuts  in  front  of  the  profile  and  behind  it,  the 
columns  should  not  be  split  at  the  cut.  Instead  the  jump  conditions  should  be  satisfied 
simultaneously  with  the  difference  equations  for  the  full  potential  equation.  This  can  be 
done  without  disturbing  the  tridiagonal  structure  of  the  column  equation  system.  Similarly 
the  Neumann  boundary  conditions  on  the  surface  of  the  airfoil  and  the  leading  edge  equation 
have  to  be  solved  for  simultaneously  with  the  current  column. 

The  local  linearization  of  the  equations  of  one  column  is  done  by  freezing  the 
coefficients.  These  coefficients,  i.e.  the  velocity  values,  should  be  calculated  by  using 
only  so-called  old  potential  values  from  the  previous  iterate.  This  has  turned  out  to  be  a 
little  more  stable  than  using  newest  values,  especially  for  the  hyperbolic  region  [1], 
Additionally,  in  the  hyperbolic  region  the  pure  Gauss-Seidel  algorithm  can  be  changed  so  as 
to  use  a  proper  combination  of  old  and  new  potential  values.  This  can  also  improve  the 
stability  of  the  relaxation  process  sometimes  [1],[5],  especially  on  coarse  grids. 
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— —  line  1  - line  2  —  ~  line  3 

FIG.  6:  Lines  for  local  relaxation  sweeps  around  the  leading  edge 

To  improve  the  rate  of  convergence,  local  relaxation  sweeps  are  necessary  due  to  the 
grid  singularity  near  the  leading  edge  region.  We  took  some  special  lines  running  around 
the  nose  of  the  airfoil  (see  Figure  6  e.g.).  Only  4-5  lines  are  to  be  relaxed  in  order  to 
achieve  an  overall  convergence  factor  as  expected  by  the  local  mode  analysis,  in  general. 
Table  1  gives  a  typical  result.  The  additional  computational  effort  can  be  expressed  by  the 
ratio  of  CPU-times  which  is  typically  less  than  1.01. 

In  the  case  of  highly  varying  grid  aspect  ratios,  which  is  the  case  in  the  code 
presently  used,  the  whole  problem  shows  a  rather  bad  behaviour  regarding  anisotropy.  The 
significant  coefficients  change  from  10  5  to  105  several  times  throughout  the  region.  So 
the  smoothing  factors  [3]  of  the  column  relaxation  are  nearly  1  in  some  parts  of  the  region, 
and  are  very  small  in  other  parts.  A  typical  case  is  given  in  Figure  7  where  the  local 
smoothing  factors  are  shown.  The  smoothing  factors  are  calculated  for  the  column  relaxation 
by  locally  freezing  the  velocity  values  from  a  calculated  solution. 


TABLE  1:  Influence  of  local  relaxation  sweeps  on  the  convergence  factor 


Number  of  local  sweeps 

0 

1 

2 

3 

4 

5 

9 

NACA0012,  a=  0°,  0.6 

only  .olumn  relaxation 

0.248 

0.221 

0.158 

0.138 

0.108 

0.085 

0.085 

RAE2822,  a  -  0°,  -  0.4 

row  and  column  relaxation 

0.150 

0.148 

0.127 

0.093 

0.084 

0.084 

0.084 
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The  MG-convergence  deteriorates  according  to  the  bad  smoothing  behaviour.  To  avoid 
this,  the  smoothing  process  has  been  augmented  by  a  row  relaxation  algorithm  which  is  used 
alternately  to  the  column  relaxation.  In  order  to  keep  the  algorithm  simple,  the  jump 
conditions  along  the  wake  are  skipped  and  left  to  the  column  relaxation  part.  But  the 
leading  edge  equation  as  well  as  the  Neumann  boundary  conditions  are  enclosed  in  the  row 
relaxation. 

The  linearized  system  of  equations  per  row  has  a  4-diagonal  form  due  to  the  backward 
differencing  in  the  hyperbolic  region.  We  did  not  try  to  reduce  this  to  a  tridiagonal  form 
by  taking  the  leftmost  potential  value  from  the  previous  iterate. 


4.2  Intergrid  Transfer 

The  usual  full  residual  weighting  operator  for  equally  spaced  grids 
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can  be  interpreted  as  a  discrete  form  of  the  integral  equation 


/C  dV  =/rh  dV  (17) 

v  v 

where  the  residuals  rh  are  considered  as  continuous  function  over  the  control  volume  V.  This 
form  of  weighting  works  very  well  for  the  full  potential  equation  in  a  cartesian  coordinate 
system  [2], 

The  boundary  value  problem  considered  here  is  the  transformation  from  that  coordinate 
system  to  a  body-fitted  system.  Consequently,  a  more  sophisticated  full  weighting  will  be 
the  discrete  analogue  to  the  transformed  integral  relation 

IV  r>  dV  --  Ah  dV 

v  v  (18) 
which  is  the  following  point-dependent  averaging  operator 
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When  using  unequally  spaced  grids  the  full  weighting  operator  becomes  more  complex  because 
of  the  mesh-dependent  weighting  coefficients.  It  exactly  represents  the  discrete  integration 
rules  for  non-equidistant  grids. 

Similar  weighting  operators  can  be  constructed  for  the  boundary  equations,  too. 

Our  experience  with  both  equidistant  and  non-equidistant  grids  is  that  there  is  no  need 
for  the  transformed  full  weighting.  The  gain  in  the  speed  of  convergence  is  at  most  1  per 
cent  which  is  much  less  than  the  numerical  effort  (see  Figure  8).  So  we  decided  to  use  the 
usual  full  weighting  operator  (16)  adapted  to  the  unequally  spaced  grid. 

The  corrections  calculated  by  the  coarse  grid  correction  step  are  to  be  transferred  to 
the  next  finer  grid.  Within  our  approach  we  only  used  bilinear  interpolation  because  our 
experience  with  more  complicated  interpolation  schemes  was  negative  (see  [1]  e.g.). 


4.3  Solution  on  the  Coarsest  Grid 

The  approximative  "solution*  on  the  coarsest  grid  is  done  by  several  (up  to  30)  relaxation 
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x  standard  full  weighting 


•  transformed  full  weighting 


FIG.  8:  Convergence  history  -  comparison  between  standard  and  transformed  full  weighting, 
NACA0012,  M„  *  0.6,  <*=  0* 


sweeps,  for  convenience.  Because  the  number  of  unknowns  on  this  grid  is  rather  large  due  to 
the  H-topology  of  the  grid  (about  100  to  200  points),  the  numerical  effort  becomes  a 
significant  part  of  the  overall  costs.  But  as  there  is  no  hope  to  find  a  fast  and  cheap 
solver  for  the  nonlinear  equations,  we  rely  upon  the  fact  that  a  pure  solution  to  the  system 
will  be  sufficient  for  multigrid  purposes. 


5.  ADAPTIVE  MULTIGRID  STRATEGY  FOR  SHOCK  PROBLEMS 
For  transonic  flows,  a  physically  realistic  approximation  requires  the  use  of  a  special  shock 
operator.  The  task  of  this  operator  is  to  model  the  discontinuity  of  the  velocity  component 
normal  to  the  shock  somewhat  better  than  a  pure  full  potential  equation  model  can  do.  One 
gets  a  sharp  transition  from  the  supersonic  to  the  subsonic  region,  at  which  the  flow 
satisfies  Prandtl’s  shock  jump  condition. 

The  discontinuity  of  the  normal  derivative  at  shocks  gives  rise  to  convergence  problems 
for  iterative  methods  because  there  is  a  discontinuous  switching  between  the  different 
discrete  operators  that  makes  the  solution  very  sensitive  to  small  perturbations.  In 
multigrid  algorithms  using  standard  coarsening,  sufficiently  small  changes  in  Mach  number 
cannot  be  guaranteed  near  shocks.  First,  there  is  a  rather  bad  approximation  quality  of  the 
coarse  grid  operator  to  the  fine  grid  one  around  shocks,  and  secondly,  the  interpolation  of 
the  corrections  across  shocks  may  not  be  smooth.  So  the  velocity  values  on  the  fine  grid  may 
experience  rather  large  changes.  This  can  cause  changes  in  the  shock  position  and  thereby 


0123456789  10 

#  CYCLES 

FIG.  9:  Convergence  history  for  a  non-convergent  case  -  standard  grid  coarsening, 
NACA0012,  M„=  0.84,  a=  0* 


FIG.  10:  Convergence  history  for  a  convergent  case  -  standard  grid  coarsening, 
NACA0012,  M„=*  0.83,  a  -  0* 
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FIG.  11:  Sequence  of  three  consecutive  grids  near  a  shock  (only  ^-direction  plotted) 


X  M„=  0.83 
«  M*,=  0.84 


FIG.  12:  Convergence  history  for  two  transonic  flows  -  adaptive  grid  coarsening, 
NACA0012,a=t  0 
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disturbe  the  monotonic  convergence.  Figure  9  shows  the  convergence  history  of  a  typical 
"non-converging"  multigrid  run.  The  airfoil  is  a  usual  NACA0012  airfoil  at  a  free  stream 
Mach  number  of  0.84  and  zero  angle  of  attack.  Looking  to  the  single  iterates,  oscillating 
shock  positions  can  be  observed. 

A  large  number  of  numerical  tests  have  made  clear  that  the  effect  of  non-convergence  is 
strongly  related  to  the  position  of  the  shock  relative  to  all  coarser  grids.  If  a  column  of 
every  coarser  grid  is  located  within  one  mesh  width  of  the  finest  grid  aside  the  shock,  the 
convergence  will  be  as  fast  as  in  subsonic  flows.  A  typical  result  is  obtained  for  a  free 
stream  Mach  number  of  0.83,  which  is  slightly  less  than  in  the  non-convergent  case  mentioned 
above.  The  convergence  factor  for  this  case  is  0.11  (see  Figure  10). 

A  robust  and  reliable  way  out  of  the  difficulty  of  non-convergence  has  been  found  in  the 
following  adaptive  grid  coarsening  strategy.  This  strategy,  in  principle,  improves  the 
approximation  quality  of  the  coarse  grid  operator  and  at  the  same  time  avoids  interpolation 
of  corrections  across  shocks.  In  the  neighbourhood  of  shocks,  no  coarsening  is  done  in  the 
flow  direction.  Compared  to  the  standard  coarsening,  only  a  few  more  columns  of  each  finer 
grid  are  taken  over  to  the  next  coarser  one.  The  columns  to  be  retained  can  be  found 
adaptively  during  the  iteration  process,  i.e.  depending  on  the  current  location  of  the  shock 
on  the  finest  grid.  Figure  1 1  shows  a  typical  fine  grid  near  a  shock  and  two  adaptively 
chosen  coarse  grids.  No  severe  changes  in  the  computer  program  are  necessary  and  the 
additional  computational  effort  for  reconstructing  and  using  the  enlarged  coarse  grids  during 
the  iteration  process  is  typically  about  10  per  cent. 

Figure  12  shows  convergence  histories  for  both  the  convergent  and  the  non-convergent 
cases  mentioned  above  -  now  with  the  use  of  the  new  coarse  grid  strategy.  Obviously,  there 
is  no  difference  in  the  speed  of  convergence  any  more  and  the  performance  can  really  be 
called  highly  effective. 


6.  NUMERICAL  RESULTS 

In  this  section  we  give  some  numerical  results  concerning  the  speed  of  convergence  of  the 
different  methods  used.  Some  plots  of  pressure  curves  are  given  for  illustration. 


6.1  Subsonic  Flows 

Many  examples  of  subsonic  flows  have  been  tested.  We  can  restrict  the  presentation  to  two 
typical  cases  with  the  NACA0012  airfoil. 

The  first  one  is  a  flow  without  lift.  The  f ree  stream  Mach  number  of  0.72  is  just  below 
the  critical  Mach  number.  The  speed  of  convergence  obtained  with  a  (2,1)-W-Cycle  on  3  grids 
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TABLE  2:  Influence  of  Mach  number  on  the  convergence  factor.!,  NACA0012,  a  =  0° 


Free  stream  Mach  number 

0.1 

0.72 

pm 

0.80 

0.83 

0.84 

0.85 

0.86 

Maximum  local  Mach  number 

0.12 

0.99 

1.08 

1.25 

1.32 

1.34 

1.36 

1.38 

Work/Cycle  (equiv.  col.  relax.) 

21.3 

21.3 

22.7 

23.5 

23.5 

23.5 

23.5 

23.5 

Mean  convergence  factor 

0.04 

0.04 

0.04 

0.06 
_ l 

0.06 

0.06 

0.06 
- i 

0.06 

can  be  seen  from  Figure  13.  Both  the  residual  and  the  maximum  Mach  number  converge  ver\ 
fast.  The  convergence  factor  per  cycle  is  about  0.04  and  the  computational  costs  are  about 
21  work  units  -  compared  to  the  computational  work  for  one  step  of  an  ordinary  column 
relaxation. 

The  second  test  case  is  a  flow  with  lift:  0.63,  a  =  2  .As  shown  in  Figure  14,  the 

speed  of  convergence  is  as  high  as  without  lift. 


6.2  Transonic  Flows 

The  effect  of  an  increase  in  the  free  stream  Mach  number  on  the  speed  of  convergence  of  the 
adaptive  multigrid  algorithm  is  depicted  in  Table  2.  For  the  flow  around  a  closed  NACA0012 
airfoil  at  zero  angle  of  attack  the  Mach  number  is  varied  from  0.1  to  0.86.  As  the  maximum 
local  Mach  number  indicates,  we  have  covered  the  whole  physically  significant  regime  of 
applicability  of  the  full  potential  flow  model.  Obviously,  there  is  no  effective  dependency 
of  the  convergence  factors  per  cycle  on  the  flow  situation.  The  number  of  work-units  per 
cycle  naturally  increases  due  to  the  adaptive  grid  coarsening  when  the  flow  becomes 
transonic.  But  the  additional  computational  effort  is  only  about  10  per  cent. 

As  a  last  example.  Figure  15  shows  the  convergence  history  for  a  transonic  flow  with 
lift:  airfoil  NACA0012  at  M,«  0.7  and  a=  2*  .  The  result  is  in  principle  the  same  as  for 
flows  without  lift. 


7.  CONCLUSION 

The  new  adaptive  grid  coarsening  strategy  within  the  FAS  multigrid  code  has  turned  out  to 
yield  a  robust  and  really  fast  solution  algorithm  for  subsonic  and  transonic  flows.  Only 
minor  additional  work  is  necessary  to  guarantee  that  the  speed  of  convergence  is  as  high  as 
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FIG.  13:  Convergence  history,  NACA0012,  M**  0.72,  a»  0 


Pressure 
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FIG.  14:  Convergence  history,  NACA0012,  M*-  0.63,  a  -  2° 
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FIG.  15:  Convergence  history,  NACA0012,  M»=  0.70,  a<=  2* 


for  simple  elliptic  model  problems  and  independent  of  the  flow  situation.  Thus  the  full 
multigrid  efficiency  has  been  achieved  even  for  a  nonlinear  mixed  type  problem. 

The  grid  coarsening  strategy  can  easily  be  used  in  other  kinds  of  full  potential  codes 
and  will  hopefully  lead  to  similar  gains  in  efficiency  as  it  has  done  with  our  approach. 
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1.  INTRODUCTION:  APPROXIMATE  PSEUDO-INVERSES 

Consider  the  singular  linear  operator  A:X  — ►  V  and  the  problem  of  constructing, 
given  some  element  y  €  y,  an  x  e  X  that  satisfies  the  equation 

Ax  —  y 

if  this  equation  has  a  solution,  which  is  the  case  when  y  6  71(A),  and  comes  as  close  as 
possible  otherwise.  To  be  precise,  we  wish  to  construct  an  x  £  X  which  will  minimize 
the  norm  of  the  residual 

r  —  y  —  Ax 

when  an  exact  solution  is  not  possible.  If  the  null-space  A 7(A)  is  non-trivial,  there  will 
be  more  than  one  such  x,  and  the  Moore-Penrose  solution  x+  is  defined  to  be  the 
smallest  such,  the  one  of  minimum  norm.  This  map,  which  associates  the  pseudo-solution 
x+  with  each  y  €  y,  turns  out  to  be  a  linear  operator,  and  is  called  the  Moore-Penrose 
pseudo-inverse  A+  of  the  operator  A.  For  a  more  detailed  discussion  we  recommend,  in 
addition  to  the  original  papers  of  Moore  [20]  and  Penrose  [22],  the  paper  of  Ben-Israel  [5], 
and  the  recent  book  of  Ben-Israel  and  Greville  [4]. 

The  standard  algorithm  for  the  construction  of  A+,  useful  when  a  matrix  representa¬ 
tion  for  A  is  available,  begins  with  the  singular  value  decomposition  of  A.  This  results 
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in  a  matrix  representation  for  the  linear  operator  A+,  but  the  0(n3)  cost  makes  it  useless 
for  a  large  problem.  (For  our  purposes  a  problem  is  large  when  n  reaches  a  million.)  An 
alternative  is  to  not  represent  A+  by  a  matrix,  but  rather  by  an  efficient  algorithm,  one 
which  will  construct  the  Moore- Penrose  solution  x+  from  any  given  y  E  y.  This 
is  the  approach  followed,  for  example,  by  Keller  [17],  who  generalizes  SOR  to  singular 
A,  and  Bjorck  and  Elfving  [8,13],  and  Nashed  and  Kammerer  [16],  who  generalize  the 
preconditioned  conjugant  gradient  algorithm  of  Concus,  Golub,  and  O’Leary  [12]. 

We  are  interested  in  using,  as  an  algorithmic  representation  for  the  Moore-Penrose 
pseudo  inverse  A+,  the  simple  iterative  defect  correction  algorithm 

algorithm  API-DC 

rn  =  y-Axn,  xn+1=xn  +  Zrn 

in  which  Z  is  an  approximate  pseudo-inverse  of  A.  To  be  precise,  we  will  call  the  linear 
operator  Z.y  —*  X  an  approximate  pseudo-inverse  of  A  if  for  some  t  <  1 

||  (Z  -  ZAZ)v  ||  <  e  ||  Zv  ||  VvEy,  Af(Z)  1  7 1(A)  ,  7 l(Z)  1  N(A)  . 

Here  71(A)  and  M(A)  denote  the  range  and  null  space,  respectively,  of  the  linear  operator 
A. 

The  most  elementary  example  of  an  approximate  pseudo-inverse  is  Z  =  7 A*  for 
sufficiently  small  7.  More  generally,  A*  times  a  polynomial  in  AA*  automatically  satisfies 
the  orthogonality  requirements  of  the  definition  and  can  be  efficiently  implemented  for 
large  sparse  problems.  If  the  spectrum  of  the  operator  A*  A  is  known,  or  can  be  bounded, 
the  coefficients  of  this  polynomial  may  be  chosen  so  that  the  image  of  ZA  is  the  interval 
(  1  —  e,  1  +  e  )  for  some  e  <  1 . 

Our  primary  example  of  a  very  efficient  approximate  pseudo-inverse  Z,  and  one 
which  does  not  have  a  sparse  matrix  representation  in  the  traditional  sense,  is  the  multigrid 
algorithm  FAPIN  (which  denotes  Fast  Approximate  Pseudo-Inverse).  We  give  a  precise 
definition  of  this  algorithm  in  Section  3  and  prove  there  that  it  is  an  approximate  pseudo¬ 
inverse  if  the  smoothing  operator  satisfies  a  rather  simple  condition.  In  Section  2  we  give 
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some  examples  of  local  approximate  pseudo-inverses,  those  which  do  have  a  sparse  matrix 
representation.  First,  though,  we  show  that  the  above  definition  is  both  correct  and  useful. 

Theorem  1.  If  Z  is  an  approximate  pseudo-inverse  of  A,  then  algorithm  API-DC  con¬ 
verges  at  the  geometric  rate  e  for  any  x°  to  an  x  such  that  ||  y  —  Ax  ||  is  minimized.  If 
x°  =  0,  then  x  is  the  Moore-Penrose  pseudo-inverse  solution  x+  =  A+y. 

Proof.  To  prove  convergence  observe  that  two  iterations  of  API-DC  yield  the  identity 

xn+1 -xn  =  (Z  -  ZAZ)(y- Ax"-1)  . 

The  fact  that  Z  is  an  approximate  pseudo-inverse  leads  to  the  inequality 

II  *"+1  -  *"  ||<  e  ||  Z{y  -  Axn~')  ||=  e  ||  x"  -  x"'1  || 

from  which  it  follows  that  ||  xn+1  —  i"  ||  <  e"  ||  x1  —  x°  ||.  This  proves  that  x”  converges 
geometrically  to  an  element  x  G  X.  Letting  both  xn  and  xn_1  converge  to  x  in  the  above 
identity  yields  Z(y  —  Ax)  =  0.  That  is,  r  =  (y  —  At)  G  Af(Z),  which  is  perpendicular  to 
71(A).  It  follows  that  ||  r  ||  is  minimal. 

If  x°  =  0,  then  x"  G  P-(Z)  —  Af'L(A)  for  all  n  and  the  same  for  the  limit  x.  It  follows 
that  x  X  (x  —  x')  for  any  x'  such  that  Ax'  =  Ax.  This  means  that  ||  x  ||  is  minimized,  and 
x  =  x+  =  A+y,  completing  the  proof. 

We  observe  that  if  x°  /  0,  algorithm  API-DC  converges  to  the  point  x  closest  to  x° 
with  the  property  that  ||  y  —  Ax  ||  is  minimized.  This  is  useful  in  practice,  for  we  sometimes 
want  to  determine  the  minimum  perturbation  x  of  the  given  function  x°  adequate  to  satisfy, 
as  nearly  as  possible,  the  constraint  Ax  —  y.  If  a  numerical  solution  of  a  partial  differential 
equation  is  to  satisfy  a  side  condition,  we  may  wish  to  execute  such  a  projection  every  time 
step,  making  it  important  to  have  a  fast  algorithm  for  executing  this  projection. 

2.  LOCAL  APPROXIMATE  PSEUDO-INVERSES 

If  the  singular  lines  •  operator  A:  X  — >  y  and  its  Hermitian  conjugate  A*:y  —>  X  both 
have  sparse  matrix  representations,  it  is  highly  likely  that  there  are  natural  integer  valued 
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metrics  on  the  bases  {  <t>i)  and  {  ipi  }  of  X  and  y  such  that  the  non-zero  elements 
of  any  row  correspond  to  basis  elements  a  distance  at  most  2 q  apart,  for  some  integer  q. 
When  this  is  the  case  we  will  say  that  the  operators  axe  q-local.  For  example,  the  finite 
element  discrete  Laplacian  on  a  two-torus  defined  by  the  9-point  operator 


is  1-local  if  the  metric  on  the  discrete  two-torus  is  chosen  appropriately.  We  observe  that 
this  operator  is  singular,  for  it  maps  any  constant  function  on  the  two-torus  to  the  null 
function. 

For  nonsingular  problems  the  orthogonality  requirements  are  automatically  satisfied 
for  nonsingular  Z,  and  the  definition  reduces  to 

\\I  -  ZA\\<  e. 

We  refer  to  a  linear  operator  Z  satisfying  this  condition  as  an  approximate  inverse  of  A. 
Any  convergent  stationary  iterative  algorithm  for  the  solution  of  the  Ax  —  Y  can  be 
rephrased  in  ths  form  for  some  Z  (and  suitable  choice  of  norm).  The  Jacobi  method,  for 
example,  uses  Z  =  D~l,  where  D  denotes  the  diagonal  of  A.  For  further  discussion  of  the 
concept  of  an  approximate  inverse  see  Noble  [21,  p258]  or  Benson  and  Frederickson  [6]. 

The  LSq  approximate  pseudo-inverse  of  A  is  the  operator  Z  =  Ily^ny ,  where  II  x 
and  Ily  are  projections  onto  N^{  A)  (the  orthogonal  complement  of  M{A))  and  R-(A) 
respectively  and  B  minimizes  the  Frobenius  norm 

11/  -BA  Ilf- 

subject  to  the  constraint  that  B  be  q-local. 

When  A  is  nonsingular,  the  projections  are  trivial  and  Z  reduces  to  the  LSq  approx¬ 
imate  inverse  discussed  by  Benson  and  Frederickson  [6]  and  Benson  et  al.  [7].  Each  row 
of  B  can  be  determined  independently  and  potentially  in  parallel.  Local  approximate  in¬ 
verses  such  as  LSq  work  well  for  problems  such  as  spline  interpolation,  where  the  solution 
at  a  mesh  point  depends  mainly  on  data  at  nearby  points  (see  Benson  and  Frederickson 
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(6j).  As  shown  in  Benson  et  al.  [7],  they  also  prove  valuable  in  certain  elliptic  boundary 
value  problem  applications.  In  the  examples  which  follow,  the  projections  II x  and  Ily  are 
inexpensive. 

3.  THE  FAST  APPROXIMATE  PSEUDO-INVERSE  (FAPIN) 

When  the  operator  A  is  a  finite  difference  (or  finite  element)  approximation  to  an 
elliptic  operator  on  a  fine  grid,  it  is  likely  that  no  local  approximate  pseudo-inverse  Z  will 
be  particularly  efficient.  In  such  situations  we  might  expect  that  a  multi-grid  algorithm, 
in  which  several  grids  are  used  to  effectively  generate  a  non-sparse  approximate  pseudo¬ 
inverse  for  the  operator  A,  would  be  appropriate. 

We  present  convergence  results  below  for  FAPIN.  For  related  convergence  estimates, 
we  refer  the  reader  to  Bank  and  Douglas  [1]  or  McCormick  [19].  For  recent  applications  of 
the  algorithm  FAPIN,  see  Baumgardner  [2]  and  Baumgardner  and  Frederickson  [3]. 

A  multigrid  algorithm  for  the  problem  Ax  =  y  requires  nested  approximation  sequences 

X0  C  *i  C  ...  C  Xk  C  ...  C  A 

jocjic ...  c  34  c ...  c  y 

for  both  X  and  34  It  is  important  to  note  that  the  inner  product  on  A*  and  34  is  that 
induced  by  containment.  Denote  by  Pk  :  34  —*  34 -l  a  restriction  operator.  We  have  often 
found  it  advantageous  to  use  as  restriction  operator  Pk  the  transpose  of  the  interpolation 
operator  defined  by  containment  on  a  local  basis  for  34-  The  important  point  is  that  the 
sequence  of  approximations  Ak  :  Xk  — »  34  to  our  original  problem  are  those  defined  by 


Ajt-i  :  A4-! 

-+34-i 

:  x  t— > 

PkAkx ,  k  =  1,2 

or  equivalently  by  the  diagram 

xk 

At) 

34 

V 

xk 

Ak- 1 

b 

34-i  • 

-  5  - 

k 
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We  observe  that  any  x  £  Xk  has  a  unique  representation 

x  =  x'  +  x",  x'  £  Af(Ak-i),  x"  LAf(Ak-i)  , 

provided  that  k  >  0.  To  simplify  the  discussion  that  follows  we  extend  this  notation  to 
k  —  0  with  the  observation  that  A- 1  =  0  implies  x"  =  0  for  k  =  0.  Using  this  notation, 
the  convergence  condition  on  the  smoothing  operator  can  be  stated  precisely. 

Definition.  We  will  call  Z*  :  yk  —*  Xk  a  nested  approximate  pseudo-inverse  of  Ak  : 
Xk  —*  34  if  there  is  an  e  <  1,  independent  of  k,  such  that 

||(J-Z*A*)x||2  <  e2  |j  x' ||2  +  ||  x"  ||2  Vxl  Af(Ak) 

1l(Zk)  ±  Af{Ak),  Af(Zk)  ±  U{Ak). 

Experimentally  we  observe  excellent  convergence  when  A  is  the  finite  element  dis¬ 
crete  Laplacian  and  Zk  =  LSq{Ak)  is  the  nested  approximate  pseudo-inverse.  Similar 
results  were  obtained  with  the  two-sphere  discretization  described  by  Baumgardner  and 
Frederickson  [3]. 

Definition.  By  a  fast  approximate  pseudo-inverse  to  Ak  we  mean  a  linear  operator  Fk  : 
34  — +  Xk  constructed  from  a  nested  approximate  pseudo-  inverse  Z k  recursively  byF0  =  Zo 
and 

Fk  =  Zk  +  (I  -  ZkAk)Fk-iPknk  , 

where  11*  :  34  — ►  34  denotes  the  projection  onto  the  range  of  Ak. 

When  the  algorithm  FAPIN  is  implemented  on  a  parallel  computer,  it  is  convenient  to 
represent  each  of  the  operators  Ak,  Pk,  II*,  find  Z*  as  separate  procedures  operating  on  the 
coefficient  arrays  which  represent  the  elements  x*  £  Xk  and  y*,r*  £  34-  (In  effect  these 
procedures  describe  sparse  matrix  representations  of  the  operators.)  From  this  viewpoint, 
the  containment  Xk-\  C  Xk  is  usually  nontrivial,  for  the  basis  used  to  represent  Xk-\  is 
usually  not  a  subset  of  a  basis  for  Xk.  In  the  pseudo-code  for  FAPIN  which  follows,  we 
denote  by  Qk  the  representation  on  the  coefficient  space  of  the  containment  operator. 
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Theorem  2.  If  for  some  e  <  1,  Zj  is  a  nested  approximate  pseudo-inverse  for  0  <  j  <  k, 
then  Fk  is  an  approximate  pseudo-inverse  with  the  same  e. 

Proof.  We  construct  a  proof  by  induction:  The  case  k  =  0  follows  directly  from  definition 
4,  for  in  this  case  Fq  =  Zo,  and  with  x"  =  0,  Zo  is  seen  to  be  an  approximate  pseudo¬ 
inverse  with  rate  e.  Assume  the  conclusion  is  valid  for  k  —  1  and  x  —  x'  +  x",  as  in  the 
definition  of  Zk.  Observe  first  that  any  x  e  Tt(Fk)  satisfies 

(/  -  FkAk)x  =  {I-  ZkAk)(I  -  Fk-iPkAk)x  =  (I  -  ZkAk)(I  -  Ft-Uk^x  . 

Thus  (I  -  FkAk)x'  =  (/  -  ZkAk)x'  and  (I  -  FkAk)x"  =  (I  -  ZkAk)x",  where  x"  = 
(/  —  Fjt-iAjfe-x)!"  is  perpendicular  to  A^An-i)  find  satisfies  ||  x"  ||<  e  ||  x"  |j  by  induction 
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on  k.  Thus 

||  (/  -  FkAk)x  ||2=||  (/  -  ZkAk)(x'  +  x")  ||<  e2(||  x'  ||2  +  ||  x"  ||2)  =  e2  ||  x  ||2  . 
Observe  that  lZ(Fk)  _L  Af(Ak)  follows  from  1Z(Zk)  X  Af(Ak)  by  induction.  Thus  we  have 

||  (/  -  FkAk)Fky  ||<  e  ||  Fky  || 

for  any  y  d  34-  This  completes  the  proof. 

4.  PARALLEL  IMPLEMENTATION 

The  fast  approximate  pseudo-inverse  FAPIN  was  implemented  on  the  64-node  iPSC 
hypercube  at  Christian  Michelsen  Institute  in  Bergen,  Norway,  as  one  of  the  early  demon¬ 
stration  programs  on  that  machine  [14,15].  More  particularly,  it  has  served  as  an  early 
demonstration  of  the  high-level  hypercube  library  developed  at  CMI.  The  spaces  Xk  and 
34  are  thus  spread  over  the  nodes  of  the  hypercube,  using  a  higher  dimensional  Gray-code, 
in  such  a  way  that  communication  is  entirely  local.  As  a  residual  is  repeatedly  restricted  to 
ever  coarser  grids,  a  problem  arises  when  the  number  of  grid  points  is  equal  to  the  number 
of  hypercube  nodes  in  use.  Our  solution  to  that  problem  was  to  change  the  Gray-code 
before  the  next  application  of  the  restriction  operator  in  such  a  way  that  the  problem  was 
actually  being  solved  on  a  smaller  hypercube,  a  sub-cube  of  the  previous  one.  In  fact, 
the  problem  is  being  solved  on  several  parallel  sub-cubes  simultaneously.  Thus  when  the 
Gray-code  is  returned  to  its  previous  value  just  before  the  corresponding  interpolation, 
the  partial  solution  is  already  on  the  nodes  where  it  is  needed.  This  approach  is  somewhat 
different  from  the  technique  used  by  Thole  [23],  McBryan  and  Van  De  Velde  [18]  or  Chan 
and  Saad  [10,11]. 

For  an  example  we  solve  Poisson’s  problem  with  periodic  boundary  conditions,  talcing 
a  few  randomly  distributed  charges  as  the  function  y  (see  Figure  1).  We  solved  the  problem 
using  a  bi-linear  finite  element  approximation  on  a  sequence  of  increasingly  fine  grids  up 
to  512  by  512.  The  accuracy  obtained  per  iteration  was  almost  independent  of  the  order 
N  of  the  problem  as  the  grid  became  finer,  and  the  cost  of  the  computation  was  almost 
perfectly  linear  in  N  . 


Benson  and  Frederickson 


31 


Figure  1.  A  part  of  the  pseudo-solution  u+  to  Poisson’s  problem  V2u  =  v  in  two 
dimensions  with  periodic  boundary  conditions.  We  use  a  finite  element  discretization  onto 
a  512  by  512  grid  and  three  iterations  of  the  multigrid  algorithm  FAPIN. 

5.  CONCLUSION 

Through  the  use  of  an  approximate  pseudo-inverse,  an  iterative  scheme  was  presented 
which  gave  the  Moore-Penrose  pseudo-inverse  solution  to  certain  singular  systems  Ax  =  y. 
Based  on  least  squares  local  approximate  inverses,  an  0{N)  algorithm  was  developed  for 
large  sparse  singular  problems  arising  from  certain  finite  element  discretizations.  The 
methods  presented  are  well  suited  to  parallel  computation  using,  for  example,  a  hypercube. 
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1.  Introduction 

Most  massive  computational  tasks  facing  us  today  have  one  feature  in  common:  They 
are  mainly  governed  by  local  relations  in  some  low  (e.g.  2  or  3)  dimensional  space  or  grid. 
Such  are  all  differential  problems,  including  flows,  electromagnetism,  magnetoliydrody- 
namics,  quantum  mechanics,  structural  mechanics,  tectonics,  tribology,  general  relativity, 
etc.,  etc.  Such  are  also  many  statistical,  or  partly  differential  partly  statistical,  problems 
(e.g.  in  statistical  mechanics,  field  theory,  turbulence),  and  many  non- differential  problems 
like  those  in  geodesy,  multivariate  interpolation,  image  reconstruction,  pattern  recognition, 
design,  optimization  and  constrained  optimization  (e.g.,  traveling  salesman,  VLSI  design, 
the  three  dimensional  folding  of  proteins,  spin  glasses,  linear  programming  transportation), 
optimal  control,  network  problems,  and  so  on.  This  common  feature  can  be  exploited  very 
effectively  by  multi-level  (multigrid)  solvers,  which  combine  local  processing  on  different 
scales  with  various  inter-scale  interactions.  Even  when  the  governing  relations  are  not 
strictly  local  (e.g.,  integral  and  integro-differential  equations,  x-ray  crystallography,  to¬ 
mography,  econometrics),  any  problem  with  a  multitude  of  unknowns  is  likely  to  have 
some  internal  structure  which  can  be  used  by  multilevel  solvers.  In  many  cases,  the  com¬ 
putational  cost  of  such  solvers  has  been  shown  to  be  essentially  as  low  as  the  cost  can  ever 
be;  that  is,  the  amount  of  processing  is  not  much  larger  them  the  amount  of  real  physical 
information.  The  parallel- processing  complexity  of  multilevel  solvers  is  just  poly-log  in 
(i.e.,  polynomial  in  the  logarithm  of)  the  number  of  unknowns. 


The  main  body  of  this  article  is  a  modification  of  [32]. 
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84-C-0036. 
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This  article  is  a  brief  survey  of  this  field  of  study,  emphasizing  basic  ideas,  important 
recent  developments,  especially  at  the  Weizmann  Institute,  and  their  implications.  No 
attempt  is  made  to  scan  the  fast-growing  multigrid  literature.  A  list  of  more  than  600 
papers  will  appear  in  [24];  see  also  the  multigrid  books  [21],  [25],  [7],  [28],  [20],  [26], 
[22],  [39]  and  the  present  proceedings.  For  a  first  elementary  acquaintance  with  classical 
multigrid  and  its  chief  quantitative  analytical  tool  —  the  local  mode  analysis  —  see,  for 
example,  [39]  or  [7,  §1]. 

Multigrid  methods  were  first  developed  (see  historical  note,  Sec.  16)  as  feist  solvers 
for  discretized  linear  elliptic  PDEs  (see  Secs.  3,  4,  5),  then  extended  to  non-elliptic  (Sec. 
6),  nonlinear  (Sec.  7)  and  time-dependent  (Sec.  10)  problems,  and  to  more  general  alge¬ 
braic  systems  (Secs.  2,  11).  The  multigrid  apparatus  can  also  be  used  to  obtain  improved 
discretization  schemes,  especially  for  non-elliptic,  highly  indefinite,  highly  oscillatory,  and 
ill-posed  problems,  and  to  create  small  storage  algorithms,  cheap  high-order  approxima¬ 
tions  and/or  highly  efficient  local  grid  adaptation  (Secs.  7,8).  Within  the  work  of  solving 
a  single  boundary  value  problem,  for  negligible  extra  cost,  multigrid  solvers  can  incorpo¬ 
rate  continuation  processes,  local  grid  adaptations,  system  identifications,  free  boundary 
tracing  and  so  on;  and  for  much  smaller  work,  a  solved  problem  with  modified  data  can 
be  re-solved,  time  and  again,  thus  allowing  for  example  on-line  design  of  complicated 
structures  and  real-time  optimal  control  (Sec.  9). 

Recently,  mainly  in  response  to  current  computational  bottlenecks  in  theoretical 
physics,  new  types  of  multi-level  methods  have  been  developed  for  solving  large  lattice 
equations  (e.g.,  Dirac  equations  in  gauge  fields  -  Sec.  11);  for  calculating  determinants 
(Sec.  12)  and  accelerating  Monte-Carlo  iterations  (Sec.  14  and  App.  B);  and  for  discrete- 
state  and  highly-non-quadratic  minimization  (Sec.  13). 

The  discrete-state  minimization  techniques,  based  on  multi-level  stochastic  interac¬ 
tions,  is  evolving  into  a  general  approach  for  rapidly  solving  geometrically-based  optimiza¬ 
tion  problems  afflicted  with  multi-scale  local  optima  (nested  attraction  basins  at  all  scales). 
The  role  of  multileveling  for  such  problems  is  not  just  to  accelerate  local  convergence,  but 
also  (and  more  importantly)  to  enable  the  solver  to  escape  local  optima,  especially  those 
with  large  attraction  basins,  for  which  all  previous  techniques,  including  “simulated  an¬ 
nealing”,  were  ineffective.  Applications  of  these  multilevel  stochastic  optimizations  develop 
into  such  diverse  fields  as  spin  systems,  image  processing,  crystallography,  protein  folding, 
and  combinatorial  optimization. 

The  Appendices  describe  in  greater  detail  some  important  recent  developments.  The 
execution  of  multi-integrations  on  n  gridpoint,  required  in  solving  integral  equations  and  in 
simulations  of  many-body  interactions,  is  reduced  from  0(n 2)  to  0(n)  or  0(n  logn)  opera¬ 
tions,  provided  the  kernel  is  suitably  smooth  (App.  A).  For  problems  in  statistical  mechan¬ 
ics  and  field  theory,  multilevel  Monte-Carlo  techniques,  including  “stochastic  coarsening”, 
can  simultaneously  eliminate  several  kinds  of  slowness  (critical  slowing,  domain  vastness, 
slow  balancing  of  deviations)  and  very  inexpensively  incorporate  dynamic  fermions;  they 
create,  in  fact,  a  general  scheme  for  computational  derivations  of  macroscopic  dynamics 
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from  microscopic  laws  (App.  B).  Procedures  for  attaining  highest  efficiency  fluid  dynamics 
solvers  are  surveyed  in  App.  C,  emphasizing  the  use  of  multigrid  as  a  total  approach. 

Other  quite  recent  developments  to  note  are:  the  rigorous  justification  of  the  local 
mode  analysis,  including  the  proof  that  the  exact  operation  count  per  gridpoint  is  pre¬ 
dictable  and  essentially  independent  of  boundary  shapes  and  boundary  conditions  (Sec. 
5);  the  reduction  of  the  design  of  relaxation  schemes  for  general  PDE  systems  to  the  design 
of  schemes  for  simple  scalar  equations  (also  Sec.  5);  the  treatment  of  non-elliptic  steady 
state  problems  (Sec.  6);  extremely  efficient  multilevel  techniques  for  time-dependent  prob¬ 
lems  (Sec.  10);  fast  calculation  of  determinants  of  systems  of  grid  equations  (Sec.  12);  and 
multilevel  linear  programming  (Sec.  15). 


2.  Slow  Components  in  Matrix  Iterations 


Consider  the  real  linear  system  of  equations 


Ax  =  b 


(2.1) 


where  A  is  a  general  n  x  m  real  matrix.  For  any  approximate  solution  vector  x,  denote 
the  error  vector  by  e  =  x  —  x,  and  the  vector  of  residuals  by  r  =  Ae  =  b  —  Ax.  Given  x,  it 
is  usually  easy  to  calculate  r  -  especially  when  A  is  a  sparse  matrix;  e.g.,  when  A  is  based 
on  local  relations.  One  can  then  easily  use  these  residuals  to  correct  x;  for  instance,  by 
taking  one  residual  r*  at  a  time,  and  replacing  x  by  x  +  (ri/aia[  )aj ,  where  a;  is  the  i-th 
row  of  A  (thus  projecting  x  onto  the  hyperplane  of  solutions  to  the  i-th  equation).  Doing 
this  for  i  =  1, . . ,,n  is  called  a  Kaczmarz  relaxation  sweep.  It  can  be  shown  (Theorem  3.4 
in  [9])  that  the  convergence  to  a  solution  x  (if  one  exists),  of  a  sequence  of  such  (or  other) 
relaxation  sweeps,  should  slow  down  only  when 


|r|  <  |ej, 


(2.2) 


where  f  is  the  normalized  residual  vector  (f*  =  a;e/|cii|)  and  |  •  |  is  the  Euclidean  (( 2 )  norm. 
From  the  normalization  of  f  it  is  clear  that,  for  most  error  vectors,  |f|  is  comparable  to 
|e|;  (2.2)  can  clearly  hold  only  for  special  error  vectors,  dominated  by  special  components 
(eigenvectors  with  small  eigenvalues),  whose  number  is  small.  Thus,  when  relaxation  slows 
down,  the  error  can  be  approximated  by  vectors  in  some  much-lower  dimensional  space, 
called  the  space  of  slow  components. 
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The  concrete  characterization  this  space,  or  of  “slowness”,  depends  on  the  nature  of 
the  problem,  and  is  sometimes  far  from  trivial  (see  e.g.  the  “multiple  representations”  in 
Sec.  8).  In  many  cases  of  interest,  however,  we  will  now  see  that  slowness  simply  means 
smoothness  (see  Sec.  11  for  a  generalization). 


3.  Discretized  Differential  Equations 

In  case  the  system  (2.1)  represents  a  discretization  of  a  stationary  partial-differential 
equation  Lu  =  F  on  some  grid  with  meshsize  h ,  we  customarily  rewrite  it  in  the  form 

Lhuh  =  F\  (3.1) 

where  uh  is  a  grid  function.  Barring  cases  of  alignment  (see  Sec.  6),  such  a  system  is 
numerically  stable  if  and  only  if  Lh  has  a  good  measure  of  ellipticity  on  scale  h ,  inherited 
either  from  a  similar  /i-ellipticity  measure  of  L,  or  (e.g.  in  case  L  is  non-elliptic)  from  arti¬ 
ficial  ellipticity  introduced  either  by  “upstream”  or  “flux  splitting”  differencing  or  through 
explicit  “artificial  viscosity”  terms.  (Ellipticity  measures  on  uniform  grids,  and  their  scale 
dependence,  are  discussed  in  [7,  §2.1].) 

For  any  h-elliptic  operator  Lh ,  relation  (2.2)  holds  if  and  only  if  the  error  is  smooth 
on  the  scale  of  the  grid;  i.e.,  iff  its  differences  over  neighboring  grid  points  are  small 
compared  with  itself.  (This  in  fact  is  exactly  the  meaning  of  h-ellipticity.)  The  space 
of  slow  components  can  therefore  be  defined  as  the  space  of  grid-h  functions  of  the  form 
I%vH,  where  vH  are  functions  on  a  coarser  grid,  with  meshsize  H  >  h,  and  1^  is  an 
interpolation  operator  from  grid  H  to  grid  h. 

The  coarse  grid  should  not  be  too  coarse;  H  =  2h  or  so  is  about  optimal:  It  keeps  on 
one  hand  H  close  enough  to  h,  so  that  all  errors  which  cannot  be  approximated  on  grid  H 
are  so  highly  oscillatory  that  their  convergence  by  relaxation  on  grid  h  must  be  very  fast 
(convergence  factor  .25  per  sweep,  typically).  On  the  other  hand  H  =  2 h  already  yields 
a  small  enough  number  of  coarse  grid  points,  so  that  the  work  associated  with  the  coarse 
grid  (in  the  algorithms  described  below)  is  already  just  a  fraction  of  the  relaxation  work 
on  the  fine  grid. 

Let  uh  be  an  approximation  to  the  solution  uh,  obtained  for  example  after  several 
relaxation  sweeps.  To  define  a  coarse-grid  approximation  v1*  to  the  smooth  error  vh  = 
uh  —  uh,  one  approximates  the  “residual  equation" 

(3.2) 

by  the  coarse-grid  equation 
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Lhvh  =  lffrh  (3.3) 

where  iff  is  a  fine-to-coarse  interpolation  (local  averaging  in  fact,  sometimes  called 
“weighting”  or  “restriction”)  and  LH  is  a  coarse  grid  approximation  to  ZA  One  can 
either  use  the  Galerkin-type  approximation  ZA  =  iff  Lhlfj,  or  derive  ZA  directly  from 
L  by  differencing  (replacing  derivatives  by  finite  differences)  on  grid  ZZ,  which  is  usually 
far  less  expensive  in  computer  time  and  storage.  A  generally  sensible  approach  is  to  use 
compatible  coarsening,  i.e.,  the  Galerkin  approach  when  Lh  itself  has  been  constructed 
by  Galerkin  (or  variational)  discretization,  and  direct  differencing  in  case  Lh  itself  is  so 
derived,  using  the  same  discretization  order  and  “double  discretization”  (see  Sec.  10)  as 
used  by  Lh ,  etc.  (see  discussion  in  [7,  §11]).  ZA  can  then,  fully  automatically,  be  produced 
by  the  same  routines  that  produced  Lh  (even  in  nonlinear  problems:  see  Sec.  7). 

A  coarse  grid  correction  is  the  replacement  of  uh  by  uh  +  Z^rA .  Using  alternately  a 
couple  of  relaxation  sweeps  and  a  coarse  grid  correction  is  called  a  two-grid  cycle. 


4.  Multigrid  Algorithms 

There  is  no  need  of  course  to  solve  (3.3)  exactly.  Its  approximate  solution  is  most 
efficiently  obtained  by  again  alternately  using  relaxation  sweeps  (now  on  grid  H)  and 
corrections  from  a  still  coarser  grid  (2 ZZ).  We  thus  construct  a  sequence  of  grids,  each 
typically  being  twice  as  coarse  as  the  former,  with  the  coarsest  grid  containing  so  few 
equations  that  they  can  be  solved  (e.g..  by  Gaussian  elimination)  in  negligible  time. 

A  multigrid  cycle  for  improving  an  approximate  solution  to  (3.1)  is  recursively  defined 
as  follows:  If  h  is  the  coarsest  grid,  solve  (3.1)  by  whatever  method.  If  not,  denoting  by 
H  the  next  coarser  grid,  perform  the  following  three  steps:  (A)  v\  relaxation  sweeps  on 
grid  h\  (B)  a  coarse  grid  correction,  in  which  (3.3)  is  approximately  solved  by  starting 
with  vH  =  0  and  improving  it  by  7  multigrid  cycles;  (C)  1/2  additional  relaxation  sweeps 
on  grid  h. 

For  7  =  1  this  multigrid  cycle  is  called  U(i/i,i/2);  for  7  =  2  it  is  called  W{ vi,vv). 
Other  cycles,  including  accommodative  ones,  are  described  in  [7,  §6.2]. 

The  full  multigrid  algorithm  N- FMG  for  solving  (3.1),  when  h  is  not  the  coarsest  grid 
and  H  is  the  next  coarser,  is  recursively  defined  as  follows:  (A)  Solve  ZAtA  =  F11  by  a 
similar  iV-FMG  algorithm,  where  =  iff  Fh .  {FH  may  also  be  derived  directly  from  F.) 
(B)  Start  with  the  first  approximation  uh  =  I^u^,  and  improve  it  by  N  multigrid  cycles. 
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The  solution  interpolation  has  usually  a  higher  order  than  the  correction  interpolation 
/jj  mentioned  above. 

For  almost  my  discretized  stationary  PDE  problem,  a  1-FMG  algorithm,  employing 
cycles  with  v\  -+-  =  2  or  3  and  7  =  1  or  2,  is  enough  for  solving  (3.1)  to  the  level  of 
truncation  errors  (i.e.,  to  the  point  where  the  approximate  solution  u^*  satisfies  ||  — 
u1*  ||  <||  uh  —  u  ||,  in  any  desired  norm)  -  provided  proper  relaxation  and  interpolation 
procedures  are  used  (see  Sec.  5).  Only  when  Lh  has  a  high  approximation  order  p,  larger- 
N- FMG  may  be  required,  with  N  growing  linearly  in  p. 

This  means  that  the  solution  is  obtained  in  just  few  Z^-work-units,  where  an  Lh -work- 
unit  is  the  amount  of  computer  operations  involved  in  just  expressing  Lh  at  all  grid-points. 
The  only  solvers  with  an  almost  comparable  (but  on  large  grids  still  inferior)  speed  are  the 
direct  solvers  based  on  the  Fast  Fourier  Transform  (FFT),  but  they  are  essentially  limited 
to  equations  with  constant  coefficients  on  rectangular  domains  and  constant  boundary 
operators.  The  FMG  solver,  by  contrast,  attains  the  same  efficiency  for  general  nonlinear, 
not  necessarily  elliptic,  problems  (see  Secs.  6,  7),  for  any  boundary  shape  and  boundary 
conditions,  for  compound  problems  (Sec.  9),  for  eigenproblems,  and  for  problems  including 
free  surfaces,  shocks,  reentrant  comers,  discontinuous  coefficients  and  other  singularities. 

Moreover,  the  multigrid  solvers  can  fully  exploit  very  high  degrees  of  parallel  and/or 
vector  processing.  In  case  Lh  is  the  standard  5-point  approximation  to  the  Laplacian,  for 
example,  (3.1)  has  been  solved  on  the  CDC  CYBER  205  at  the  rate  of  5  million  equations 
per  second  [3].  Also,  for  little  extra  computer  work  these  solvers  can  incorporate  local 
grid  adaptation  (Sec.  7)  or  provide  a  sequence  of  extra  solutions  to  a  sequence  of  similar 
problems  (Sec.  9). 


5.  Performance  Prediction,  Optimization,  and  Rigorous  Analysis 

The  multigrid  algorithms  have  many  parameters,  including  their  relaxation  schemes, 
orders  of  interpolations,  their  treatment  of  boundaries  and  of  the  interior  equations  near 
boundaries,  etc.  To  obtain  their  best  performance,  and  to  debug  the  programs,  an  ana¬ 
lytical  tool  is  needed  which  can  predict,  for  example,  the  precise  convergence  factor  per 
cycle.  Such  a  tool  is  the  following  local  mode  analysis. 

For  equations  with  constant  coefficients  on  infinite  uniform  grids,  only  few  (£,  say) 
Fourier  components  of  the  error  function  u^1  —  uh  are  coupled  at  a  time  by  the  processes  of 
the  two-grid  cycle,  and  it  is  thus  easy  to  calculate  (usually  by  a  small  computer  program) 
the  two-grid  convergence  factor  (the  largest  among  the  spectral  radii  of  the  corresponding 
l  x  i  transfer  matrices).  For  general  equations  in  a  general  domain,  the  local  two- grid 
convergence  factor  is  defined  as  the  worst  (largest)  two-grid  convergence  factor  for  any 
“freezing”  of  the  equation  at  any  given  point  (extending  the  equation  at  that  point  to  the 
infinite  domain). 

For  a  general  elliptic  system  of  equations  Lu  =  F  with  continuous  coefficients,  dis- 
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cretized  on  a  uniform  (or  continuously  changing)  grid  in  a  general  domain,  it  has  been 
rigorously  proved  [10]  that  for  small  meshsizes  (h  — *  0)  the  local  two-grid  convergence 
factor  is  actually  obtained  globally,  provided  the  algorithm  is  allowed  to  be  modified  near 
boundaries,  by  adding  there  local  relaxation  sweeps  that  cost  negligible  extra  work.  Nu¬ 
merical  tests  clearly  show  that  this  local  relaxation  is  indeed  sometimes  necessary,  e.g., 
near  re-entrant  comers  and  other  singularities  [2,  §4].  The  performance  of  multigrid  cycles 
can  also  be  precisely  predicted,  either  by  perturbations  to  the  two-grid  analysis  or  by  more 
complex  (e.g.,  three-level)  Fourier  analyses  (coupling  more  components  at  a  time). 

Moreover,  it  can  also  be  proved  that  the  two-grid  convergence  factor,  A,  can  itself  be 
anticipated  by  the  “smoothing  factor”  of  the  relaxation  process,  p ,  which  can  be  calcu¬ 
lated  by  a  much  simpler  local  mode  analysis.  Namely,  A  =  pv  can  always  be  obtained, 
provided  i/,  the  number  of  fine-grid  relaxation  sweeps  per  cycle,  is  not  large,  and  provided 
suitable  inter-grid  transfers  (high  enough  interpolation  orders)  are  used.  Furthermore, 
in  case  of  a  complicated  system  of  q  differential  equations,  i.e.,  when  L  is  a  q  x  q  ma¬ 
trix  of  differential  operators,  a  relaxation  scheme  cam  always  be  constructed  for  which 
p  =  max(pi1 , . . . ,  pik ),  where  L\  •  •  •  L*.  is  a  factorization,  usually  into  first  and  second 
order  scalar  operators,  of  the  /i-principal  part  (the  principal  part  on  scale  h)  of  the  de¬ 
terminant  of  L,  and  pi{  is  the  smoothing  factor  obtainable  for  a  relaxation  of  L-*  (see 
[7,  §3.7]).  Thus,  the  entire  multigrid  efficiency  can  be  anticipated  from  the  smoothing 
factors  obtainable  for  simple  scalar  operators,  and  the  practical  task  then  is  to  construct 
the  intergrid  transfers  so  that  A  indeed  approaches  pv,  and  then  to  adjust  the  boundary 
processes  until  the  convergence  factor  per  multigrid  cycle  indeed  approaches  A. 

In  case  of  uniformly  elliptic  problems,  for  example,  the  factors  of  det  L  are  usu¬ 
ally  Laplacians,  for  which  the  smoothing  factor  p  =  .25  is  obtainable,  using  the  (fully- 
parallelizable  and  extremely  cheap)  Gauss-Seidel  relaxation  in  red-black  ordering.  Hence 
a  multigrid  cycle  can  be  constructed  with  convergence  factors  .25  per  fine-grid  relaxation, 
or  about  .4  per  work  unit  (taking  coarse-grid  overhead  into  account). 

For  highly  discontinuous  equations  or  discretizations,  the  theoretical  treatment  is  far 
less  precise,  but  practical  approaches  were  developed  [1],  successful  enough  to  yield  fairly 
general  black-box  solvers  [15]. 

Many  situations  are  analyzed  by  non-local  theories,  developed  over  a  vast  literature; 
see  e.g.  [20]  and  references  therein.  The  trouble  with  the  non-local  approach  is  that  its 
estimates  are  not  realistically  quantitative:  the  convergence  factor  per  cycle  is  indeed 
shown  to  be  bounded  away  from  1  independently  of  h,  but  its  actual  size  is  either  not 
specified  or  is  so  close  to  1  that  it  is  useless  for  practical  purposes  (such  as  selecting, 
optimizing  and  debugging  the  various  processes),  and  no  one  believing  it  would  use  the 
algorithm.  In  fact,  it  led  to  several  practical  misconceptions  [7,  §14]. 

The  theory  in  [9]  gives  rigorous  realistic  two-grid  convergence  estimates  for  very  ir¬ 
regular  cases,  in  fact  for  general  symmetric  algebraic  systems  without  any  grids  or  any 
other  geometrical  basis.  This  theory  is  nearly  optimal  for  the  crude  (geometry-less)  in¬ 
terpolations  it  considers.  To  extend  it  to  the  prediction  of  the  multi- grid  rates  obtainable 
with  better  (geometrically-based)  interpolations,  it  should  be  combined  with  some  local 
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analysis,  not  yet  developed. 


6.  Non-Ellipticity  and  Slight  Ellipticity 

For  non-elliptic  differential  equations  (or  equations  with  small  ellipticity  measures 
on  scale  h,  which  for  numerical  purposes  is  the  same),  it  is  a  mistake  to  try  to  obtain 
uniformly  fast  convergence  per  cycle.  Much  simpler  and  more  efficient  algorithms  are 
obtained  by  allowing  components  with  larger  truncation  errors  (such  as  the  “characteristic 
components”)  to  converge  slower,  insisting  only  that  the  1-FMG  algorithm  still  solves 
the  problem  well  below  truncation  errors.  That  this  can  be  obtained  is  shown  by  modified 
types  of  local  mode  analysis  (infinite-space  FMG  analysis  supplemented  by  half-space  FMG 
analysis.  See  [6]).  To  check  that  this  is  indeed  obtained  in  any  particular  run  one  should 
measure  the  differential  convergence  (the  convergence  to  the  differential  solution,  found 
from  differences  between  the  FMG  solutions  on  successive  levels),  rather  than  the  algebraic 
convergence  (the  reduction  of  residuals,  which  need  not  be  feist  here).  It  is  anyway  the 
former  that  one  should  really  be  interested  to  know. 

The  usual  FMG  algorithm  need  only  be  modified  in  case  of  consistent  alignment,  i.e., 
in  case  the  grid  is  consistently  aligned  with  the  characteristic  directions.  Such  alignment 
is  necessary  when  accuracy  is  desired  in  the  “characteristic  components”,  i.e.,  components 
which  axe  smoother  along  than  across  characteristic  lines.  For  obtaining  that  accuracy,  Lh 
should  be  non-h-elliptic,  and  the  usual  point-by-point  relaxation  will  then  smooth  the  error 
only  in  the  characteristic  directions  (in  which  semi-/i-ellipticity  is  necessarily  still  main¬ 
tained).  One  should  therefore  either  modify  relaxation,  by  simultaneously  relaxing  points 
along  characteristic  lines  (“line  relaxation”),  or  use  “semi  coarsening”,  i.e.,  a  coarser  grid 
whose  meshsize  is  larger  only  in  the  characteristic  directions.  Semi  coarsening,  sometimes 
combined  with  line  relaxation,  is  especially  recommended  in  higher-dimensional  situations 
where  the  alignment  is  not  in  lines  but  in  planes. 

Expensive  procedures  of  alternating- direction  line  or  plane  relaxation  are  not  needed 
in  natural  coordinates,  since  only  consistent  alignment  matters  in  solving  to  the  level 
of  truncation  errors.  Such  expensive  procedures  will  however  very  often  be  needed  if 
anisotropic  coordinate  transformations,  and  nonuniform  gridline  spacings  in  particular,  are 
employed,  thereby  artificially  creating  excessively  strong,  grid-aligned  discrete  couplings.  It 
is  therefore  generally  not  recommended  to  use  global  grid  (or  coordinate)  transformations, 
but  instead  to  create  local  refinements  and  local  grid  curvings  in  the  multigrid  manner 
(see  Sec.  7). 

The  design  of  relaxation  for  non-elliptic  equations  should  use  modified  definitions 
of  smoothing  factors,  to  account  for  the  decreased  smoothing  needed  for  characteristic 
components  (see  [7,  §20.3.1]),  and  for  the  semi-coarsening,  whenever  applied  (see  [7,  §3.3]). 

For  non-elliptic  or  slightly  elliptic  problems  it  is  also  recommended  to  use  double 
discretization  schemes  (see  Sec.  8),  since  some  natural  (e.g.  central)  discretizations  are 
good  for  smooth  components  but  bad  for  non-smooth  ones.  (See  in  App.  C  more  about 
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fluid  dynamics  discretizations). 


7.  FAS:  Nonlinear  Equations,  Local  Grid  Adaptation,  r  Extrapolation, 
Small  Storage 

In  the  Full  Approximation  Scheme  (FAS)  the  coarse-grid  unknown  vH  is  replaced 
by  the  unknown  uH  =  lPuh  +  ,  where  iP  is  another  fine-to-coarse  interpolation  (or 

del  n  n 

averaging).  In  terms  of  u1* ,  the  coarse  grid  equation  (3.3)  becomes 

L«ua  =  FH  +  T»,  (7.1) 

where  FH  =  ifj Fh  and  rff  =  LHlffuh  —  iff  Lhuh.  This  equation  evidently  has  the  form 
of  a  “defect  correction”  (correcting  LH  by  Lh,  their  difference  being  measured  by  uh), 
hence  it  makes  full  sense  even  in  the  case  that  L  is  nonlinear. 

Indeed,  using  FAS,  nonlinear  equations  are  solved  as  easily  and  fast  as  linear  ones.  No 
linearization  is  required  (except  for  some  local  linearization,  in  relaxation,  into  /i-principal 
terms,  which  in  the  prevalent  case  of  quasi-linear  equations  means  no  linearization  at  all). 
The  1-FMG  algorithm  has  solved,  well  below  truncation  errors,  various  flow  problems, 
including  compressible  and  incompressible  Navier-Stokes  and  Euler  equations,  problems 
with  shocks,  constrained  minimization  problems  (complementarity  problems,  with  free 
surfaces)  and  many  others.  “Continuation”  techniques,  sometimes  needed  for  reaching  the 
solution  “attraction  basin”,  can  be  incorporated  for  little  extra  calculations  (see  Sec.  9). 

In  FAS,  averages  of  the  full  solution  are  represented  on  all  coarser  grids  (hence  the 
name  of  the  scheme).  This  allows  for  various  advanced  techniques  which  use  finer  grids 
very  sparingly.  For  example,  the  fine  grid  may  cover  only  part  of  the  domain:  outside 
that  part  (7.1)  will  simply  be  used  without  the  rff  term.  One  can  use  progressively  finer 
grids  at  increasingly  more  specialized  subdomains,  effectively  achieving  a  non-uniform  dis¬ 
cretization  (needed  near  singularities)  which  still  uses  simple  uniform  grids,  still  has  the 
very  fast  multigrid  solver,  and  yet  is  very  flexible.  Grid  adaptation  can  in  fact  in  this 
way  be  incorporated  into  the  FMG  algorithm:  On  proceeding  to  finer  levels  the  algorithm 
also  defines  their  extent  (see  [5], [2]  or  [7,  §9j).  Moreover,  each  of  the  local  refinement 
grids  may  use  its  own  local  coordinate  system,  thus  curving  itself  to  fit  boundaries,  fronts, 
characteristic  directions  or  discontinuities  (all  whose  locations  are  already  approximately 
known  from  the  coarser  levels),  with  the  additional  possibility  of  using  anisotropic  mesh- 
sizes  (e.g.  much  finer  across  them  along  the  front).  Since  this  curving  is  only  local,  it  can 
be  accomplished  by  a  trivial  transformation,  which  does  not  add  substantial  complexity 
to  the  basic  equations  (in  contrast  to  global  transformations). 

The  fine-to-coarse  correction  rff  gives  a  rough  estimate  of  the  local  discretization 
error.  This  can  be  used  in  grid  adaptation  criteria.  It  can  also  be  used  to  h-extrapolate 
the  equations,  in  order  to  obtain  a  higher  order  discretization  for  little  extra  work  This 
extrapolation  is  more  useful  them  the  Richardson  type,  since  it  is  local  (extrapoki  >  mg  the 
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equation,  not  the  solution):  it  can  for  example  be  used  together  with  any  procedure  of 
local  refinements. 

In  view  of  (7.1),  the  role  of  grid  h  is  really  only  to  supply  the  defect  correction  rff  to 
grid  H.  For  that,  only  a  local  piece  of  the  fine  grid  is  needed  at  a  time.  Similarly,  only  a 
piece  of  grid  H  =  2h  is  needed  at  a  time,  to  supply  etc.  This  gives  rise  to  algorithms 
that  can  do  with  very  small  computer  storage  (even  without  using  external  storage!). 


8.  Multigrid  Discretization  Techniques 


The  above  local  refinements,  local  coordinates,  refinement  criteria,  local  h  extrapo¬ 
lations  and  small-storage  techniques  were  examples  of  using  the  multilevel  apparatus  to 
obtain  better  discretizations ,  not  just  fast  solvers.  Other  examples  are: 

Double  discretization  schemes.  The  discrete  operator  Lh  used  in  calculating  the  resid¬ 
uals  (3.2),  for  the  global  process  of  coarse  grid  corrections,  does  not  need  to  coincide  with 
the  one  used  in  the  local  process  of  relaxation.  The  latter  should  have  good  local  proper¬ 
ties,  such  as  stability  (possibly  obtained  by  adding  artificial  viscosity)  and  admittance  of 
sharp  discontinuities  (achieved  by  using  stencils  that  tend  to  avoid  straddling  steep  gradi¬ 
ents),  while  the  former  should  excel  in  global  attributes,  such  as  high  accuracy  (obtained 
by  omitting  artificial  viscosities  and  possibly  using  higher-order  differencing)  and  conser¬ 
vation  (through  conservative  differencing).  Such  schemes  do  not  converge  to  zero  residuals, 
of  course,  but  can  approximate  the  differential  equations  much  better  than  either  of  their 
constituent  discretizations  alone,  especially  in  cases  of  conflicting  requirements  (cf.  Sec.  6). 

Multiple  representation  schemes.  The  coarse-grid  solution  representation  does  not 
need  to  coincide  with  that  on  the  fine  grid.  For  example,  some  nearly  singular  smooth 
components  (typical  in  slightly  indefinite  problems)  should  on  some  coarser  grids  be  singled 
out  and  represented  by  one  parameter  each  (see  [14]).  Or,  more  importantly,  highly  oscil¬ 
latory  components  showing  small  normalized  residuals  (typical  in  standing  wave  problems, 
as  in  acoustics,  electromagnetism,  Schrodinger  equations,  etc.)  should  be  represented  on 
coarser  grids  by  their  slowly  varying  amplitudes.  The  coarser  the  grid  the  more  such  “rays” 
should  be  separately  represented.  Grids  fine  enough  to  resolve  the  natural  wavelength  can 
be  used  only  locally,  near  boundary  singularities,  where  ray  representations  break  down. 
This  hybrid  of  wave  equations  and  geometric  optics  can  treat  problems  which  neither  of 
them  can  alone,  in  addition  to  supplying  a  fast  solver  for  highly  indefinite  equations. 

Global  conditions  and  non-local  boundary  conditions  (radiation  conditions,  flow  exit 
boundaries,  etc.)  are  easily  incorporated,  by  transferring  their  residuals  from  fine  grids 
and  imposing  them  only  at  suitably  coarser  levels. 

Treating  large  domains  by  placing  increasingly  coarser  grids  to  cover  increasingly  wider 
regions. 

Fast  integrals.  In  case  of  integral  equations  with  suitably  smooth  kernels,  most  of 
the  work  involved  in  just  performing  the  integrations  can  be  spared,  by  performing  them 
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mainly  on  coarser  grids  using  suitable  FAS  versions  (see  App.  A). 

Finally,  multigrid  convergence  factors  always  detect  bad  discretizations,  especially 
when  “compatible  coarsening”  is  used  (see  Sec.  3).  Several  previously  unnoticed  flaws 
in  widely  accepted  discretization  schemes  were  so  discovered.  Furthermore,  brief  1-FMG 
algorithms  tend  to  correct  bad  discretizations,  by  being  very  slow  in  admitting  ill-posed 
components  (components  showing  small  residuals  compared  with  other  components  of 
comparable  smoothness).  For  example,  quasi-elliptic  discretizations  (resulting  e.g.  from 
central  differencing  on  non-staggered  grids  of  elliptic  systems  with  first-order  principal 
derivatives)  are  so  solved  with  their  highly-oscillating  bad  components  left  out  [13].  More 
generally,  the  FMG  algorithm  and  the  multi-level  structure  provide  effective  tools  to  deal 
with  ill-posed  problems,  whether  the  ill-posedness  is  in  the  differential  problem  or  only  in 
its  discretization:  finer  grids  can  be  introduced  (in  the  manner  of  Sec.  7)  only  where  their 
scale  does  not  admit  ill-posed  components;  nonlinear  controlling  constraints,  either  global, 
local  or  at  any  intermediate  scale,  are  easily  incorporated;  etc. (On  eliminating  ill-posedness 
by  augmented  optimization,  see  Sec-  11). 


9.  Compound  Problems  and  Problem  Sequences 

A  compound  problem  is  one  whose  solution  would  normally  involve  solving  several, 
or  even  many,  systems  of  equations  similar  to  each  other.  With  multilevel  techniques,  the 
work  of  solving  a  compound  problem  can  often  be  reduced  to  that  of  solving  just  one  single 
system,  or  just  a  fraction  more. 

Take  for  example  continuation  (embedding)  processes,  in  which  a  problem  parameter 
is  gradually  changed  in  order  to  drive  the  approximate  solutions  into  the  attraction  basin 
of  the  desired  solution  to  some  target  nonlinear  problem.  Flow  problems,  for  instance, 
are  easily  solved  for  the  case  of  large  viscosity,  which  can  then  gradually  be  lowered  to 
the  desired  level,  with  the  equations  being  solved  at  each  step  taking  the  previous-step 
solution  to  serve  as  a  first  approximation.  This  process  is  almost  automatically  performed 
by  the  FMG  algorithm  (Sec.  4)  itself,  since  it  starts  on  coarse  levels,  where  a  large  artificial 
viscosity  is  introduced  by  the  discretization,  and  then  gradually  works  its  way  to  finer  grids 
with  proportionately  smaller  viscosity.  The  process,  by  the  way,  can  then  be  continued 
to  still  lower  viscosity  by  using  still  finer  levels  only  locally  (see  Sec.  7),  at  regions  where 
the  size  of  viscosity  matters  (i.e.,  where  the  flow  is  driven  by  viscosity),  and  eliminating 
viscosity  elsewhere  (e.g.  by  double  discretization  -  see  Sec.  8). 

One  1-FMG  algorithm,  with  no  extra  iterations,  can  even  be  directed  to  locate  limit 
points  (tinning  and  bifurcation  points)  on  solution  diagrams;  or  to  optimize  some  problem 
parameters,  including  optimization  of  boundary  shapes,  diffusion  coefficients,  control  pa¬ 
rameters,  etc.;  or  to  frace  free  boundaries,  strong  shocks,  and  other  discontinuities;  or  to 
solve  related  inverse  problems  (e.g.  system  identification);  and  so  on  -  all  with  accuracy 
below  truncation  errors. 

In  many  cases,  however,  repeated  applications  of  the  FMG  solver  are  still  needed: 
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cases  of  complicated  bifurcation  diagrams,  interactive  design  situations,  etc.  Even  then, 
the  multigrid  machinery  generally  provides  for  extremely  cheap  re-solving:  one  should  only 
be  careful  to  apply  FMG  to  the  incremental  problem  (calculating  only  the  change  from  the 
old  solution;  using  FAS  this  is  easily  done  even  in  nonlinear  problems)  and  to  skip  finer 
grids  (or  parts  thereof)  wherever  they  describe  negligible  high-frequency  changes. 

In  designing  a  structure,  for  example,  one  often  wants  to  re-solve  the  elasticity  equa¬ 
tions  after  modifying  some  part  of  the  structure.  The  changes  in  the  solution  axe  then 
very  smooth,  except  near  the  modified  part.  In  incremental  re-solving  one  therefore  needs 
the  fine  grid  h  only  near  that  part,  while  at  other  regions  the  coarser  grid  H  can  suffice  - 
provided  the  rff  correction  (see  (7.1))  is  kept  in  those  regions  frozen  at  its  previous  (pre¬ 
modification)  values  (otherwise  one  ignores  the  high  frequency  components  themselves,  not 
just  their  changes).  Similarly,  at  some  larger  distance  from  the  modified  part,  grid  H  —  2h 
itself  can  also  be  omitted,  then  grid  4 h,  etc.  In  this  way  re-solving  can  be  so  inexpensive 
in  computer  time  and  storage  as  to  allow  on-line  interactive  design  of  complicated  struc¬ 
tures.  Similar  frozen-r  techniques  can  be  used  in  continuation  processes  and  in  evolution 
problems. 


10.  Evolution  Problems 

Some  time-dependent  problems  may  need  no  multileveling.  These  are  hyperbolic 
schemes  where  all  the  characteristic  velocities  are  comparable  to  each  other,  and  their 
explicit  discretization  on  one  grid  is  therefore  fully  effective:  the  amount  of  processing  is 
essentially  equal  to  the  amount  of  physical  information.  However,  as  soon  as  any  stiffness 
enters,  implicit  discretization  and  multigrid  techniques  similar  to  those  in  Sec.  9  become 
desired. 

Solving  the  sequence  of  implicit  systems,  the  1-FMG  algorithm  is  all  one  needs  per 
time  step  -  provided  it  is  consistently  applied  to  the  time  incremental  problem,  since 
one  needs  to  solve  to  the  level  of  the  incremental  (not  the  cumulative)  truncation  errors. 
Moreover,  in  most  cases,  notably  in  parabolic  problems,  this  work  can  vastly  be  reduced, 
because  most  of  the  time  at  most  places  the  increment  is  very  smooth,  hence  seldom 
requires  fine-grid  processing. 

For  example,  it  has  been  demonstrated  for  the  heat  equation  du/dt  =  A u  +  F  with 
steady  boundary  conditions  and  steady  sources  F  that,  given  any  initial  conditions  at  t  =  0, 
the  solution  at  any  target  time  T  can  be  calculated,  to  the  level  of  spatial  truncation  errors, 
in  less  than  10  work  units,  where  the  work  unit  here  is  the  work  invested  in  one  explicit 
time  step.  To  obtain  the  solution  with  that  accuracy  throughout  the  interval  0  <  t  <  T, 
the  number  of  required  work  units  is  0(log  p). 

By  combining  methods  developed  for  such  purely  parabolic  problems  with  the  method 
of  characteristics,  it  may  be  possible  to  obtain  similar  results  for  problems  with  convection , 
because  the  time  increment  can  be  described  as  a  smooth  change  superposed  on  pure 
convection. 
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All  multigrid  discretization  techniques  (see  Sec.  8)  can  be  useful  for  time  dependent 
problems,  too.  One  example:  the  popular  Crank-Nicholson  discretization,  which  offers 
superior  accuracy  for  smooth  components,  has  the  disadvantage  of  badly  treating  high- 
frequency  components  at  large  time  steps.  This  conflict  is  easily  resolved  by  a  double 
discretization  scheme,  which  at  some  initial  time  steps,  and  only  at  the  fine  grids’  relax¬ 
ation  process,  replaces  Crank-Nicholson  by  the  Fully-Implicit  scheme.  Other  examples 
which  have  already  been  used  include  local  refinements,  the  r  refinement  criteria,  r  ex¬ 
trapolations,  and  a  treatment  of  an  ill-posed  (the  inverse  heat)  problem. 

Time-periodic  solutions,  or  more  generally,  solutions  with  the  same  solution  growth 
w  per  time  period,  can  inexpensively  be  computed,  for  any  spatial  grid  h,  by  integrating 
basically  on  grid  2 h:  once  a  steady  growth  has  been  established  on  grid  2 h,  a  defect 
correction  to  can  be  found  by  integrating  one  period  on  grid  h\  then  the  calculations 
on  grid  2 h  resume,  with  that  defect  added  at  each  period,  until  a  new  steady  growth  is 
established.  The  calculations  on  grid  2h  can  similarly  be  done  by  integrating  basically 
on  grid  Ah,  and  so  on.  Each  grid  integration  may  of  course  also  use  the  above  frozen  r 
techniques. 


11.  Geometrically  Based  Problems:  AMG 

Most  large  systems,  even  those  not  derived  from  discretized  continuous  problems,  still 
have  a  geometric  basis;  that  is,  each  unknown  has  a  location  in  some  low  (usually  at 
most  4)  dimensional  underlying  space  -  indeed,  the  unknowns  are  often  still  arranged  in 
lattices  -  and  the  equations  reflect  this  geometry,  e.g.  by  more  strongly  coupling  closer 
unknowns.  Examples  abound  (see  Sec.  1).  Excluding  for  the  moment  probabilistic  aspects 
(see  Sec.  14),  these  systems  can  usually  be  cast  as  minimization  problems:  the  solution 
vector  u  should  minimize  some  functioned  E(u),  called  “energy”.  This  naturally  leads  to 
various  Gauss-Seidel-type  relaxation  schemes,  in  which  E  is  decreased  as  far  as  possible 
by  changing  one  unknown  (or  one  block  of  unknowns)  at  a  time.  (Kaczmarz  relaxation  in 
Sec.  2  can  be  viewed  as  Gauss-Seidel  for  u,  where  u  =  ATu  and  E(u)  =  \uTu  —  uTb). 

Excluding  now  the  case  of  discrete  or  partly  discrete  unknowns  (see  Sec.  13),  in  all 
such  geometrically- based  systems  the  slow  components  (see  Sec.  2)  are  either  “smoothly 
representable”  or  ill-posed.  A  general  smooth  representation  of  components  is  for  example 
by  short  sums  of  terms  such  as  a(x)tp(x),  where  a(z)  is  smooth  (at  least  in  some  directions) 
while  <p(x)  may  be  highly  non-smooth  but  is  fixed  and  known  (or  easily  computable).  A 
multilevel  solver  can  then  be  constructed  in  which  a(x)  is  interpolated  from  coarser  levels. 
The  coarser  level  equations  may  be  derived  either  variationally  (i.e.,  from  the  requirement 
that  E(u)  is  reduced  as  far  as  possible  by  the  interpolated  a(x)),  or  by  simulating  direct 
differencing  approximations  (as  in  [40],  but  for  the  equations  in  terms  of  a(z)). 

A  multigrid  solver  of  the  latter  kind  has  been  constructed  for  simple  cases  of  lattice 
Dirac  equations  in  gauge  fields.  (In  this  case  y?(x)  mainly  represents  the  “gauge  invari¬ 
ance”.)  In  QED  and  QCD  (quantum  electrodynamics  and  chromodynamics)  simulations, 
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this  type  of  equations  should  be  solved  at  each  Monte-Carlo  iteration,  consuming  enor¬ 
mous  computer  resources  (see  e.g.  [18]).  This  solver,  which  employs  itself  also  for  updating 
V?(ar),  exhibits  the  usual  multigrid  speed,  and  requires  only  a  short  cycle,  costing  far  less 
than  the  rest  of  the  calculations,  per  Monte-Carlo  iteration.  (See  also  Sec.  12.) 

In  many  problems,  including  first-kind  integral  equations  in  fields  like  image  recon¬ 
struction,  tomography  and  crystallography,  there  exist  slow  components  which  are  not 
smoothly  representable.  Since  they  give  large  errors  for  small  residuals  without  being 
smooth  in  any  sense,  they  are  by  definition  ill  posed.  Such  error  components  are  intro¬ 
duced  only  very  slowly  by  the  multigrid  solvers.  Hence  they  are  harmful  only  in  as  far  as 
their  absence  causes  the  solution  to  “look  bad”.  Specifying  what  “looking  bad”  means,  can 
be  done  by  augmenting  E(x)  and/or  by  imposing  nonlinear  constraints.  Such  constraints, 
on  any  scale,  can  be  incorporated  in  the  multilevel  solver  (see  [11]).  The  augmented  con¬ 
strained  optimization  problem  may  have  some  discrete-state  secondary  variables  (such  as 
edges,  in  image  restoration);  therefore  (or  for  other  reasons)  it  may  be  afflicted  with  many 
local  optima,  in  which  case  multilevel  annealing  (see  Sec.  13)  will  have  to  be  used. 

Multilevel  solvers  can  be  constructed  even  when  the  geometric  basis  is  not  explicit.  In 
such  11  algebraic  multigrid"  (AMG)  solvers  the  coarse-level  variables  are  typically  selected 
by  the  requirement  that  each  fine-level  variable  is  “strongly  connected”,  by  the  fine- level 
equations,  to  at  least  some  coarse-level  variables.  The  coarse-to-fine  and  fine- to- coarse 
transfers  may  also  be  purely  based  on  the  algebraic  equations,  although  geometrical  in¬ 
formation  may  be  used  too  (see  [9],  [29]).  AMG  solvers  are  good  as  black  boxes,  even  for 
discretized  PDEs,  since  they  require  no  special  attention  to  boundaries,  anisotropies  and 
strong  discontinuities,  and  no  well-organized  grids  (allowing,  e.g.,  general-partition  finite 
elements). 


12.  Calculating  Determinants 


At  each  Monte-Carlo  iteration  in  QED  and  QCD  simulations,  what  is  really  required 
is  not  just  to  solve  the  lattice  Dirac  equations  (see  Sec.  11),  but  also  to  calculate  6  log  det  Q, 
where  Q  is  the  matrix  of  that  system  and  6  denotes  change  per  iteration.  Since  the  steps 
are  small,  6  log  det  Q  «  trace  of  Q~l6Q,  for  which  calculations  one  needs  to  know  (Q-1),  j 
for  all  pairs  of  neighboring  (on  the  lattice)  i  find  j.  Now,  it  can  be  shown  that  by  storing 
and  updating  similar  neighboring  values  for  coarse-grid  approximations  to  Q  (for  which 
purpose  one  also  needs  to  store  and  update  the  function  <p(x)  mentioned  in  Sec.  11),  all 
updates  can  immediately  be  done.  The  implied  coarse-level  work,  including  the  coarsening 
of  Q,  is  just  a  small  overhead.  Even  when  the  step  SQ  is  not  small,  if  it  is  local,  its  local 
effect  on  is  easily  accounted  for,  hence  6  log  det  Q  can  still  readily  be  calculated. 

This  approach  leads  to  a  general  fast  method  for  calculating  determinants  of  lattice 
equations. 
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13.  Discrete-State  Minimization:  Multilevel  Annealing 

In  statistical  physics,  combinatorial  optimization  (e.g.,  traveling  salesman,  or  inte¬ 
grated  circuits  design),  pattern  recognition,  econometrics,  and  many  other  fields  the  un¬ 
knowns  ttj,  or  part  of  them,  may  only  assume  discrete  states.  A  typical  example  is  Ising 
spins,  where  tx,-  =  ±1.  To  minimize  E(u)  in  such  problems  is  far  more  intricate  than  in 
continuous-state  problems,  since  the  relaxation  process  is  not  only  slow,  but  is  very  likely 
to  get  trapped  in  a  “local  minimum”;  i.e.,  in  a  configuration  u  which  is  not  the  true  mini¬ 
mum  but  for  which  no  allowable  change  of  any  one  tx,-,  or  even  a  small  block  of  them,  can 
lower  E. 

“ Simulated  annealing"  is  a  general  technique  for  trying  to  escape  such  local  minima 
by  assigning  at  each  step  a  certain  probability  for  the  energy  to  grow.  This  is  done  by 
simulating  thermal  systems:  to  each  configuration  u  the  “Boltzmann  probability” 

P(u)  =  t-WM/ZiP)  (13.1) 

is  assigned  (physically  jj  is  proportional  to  the  absolute  temperature  and  Z(J3)  is  a  nor¬ 
malization  factor),  and  the  above  strict-minimization  relaxation  sweeps  are  replaced  by 
“Monte-Carlo  iterations”,  in  which  each  tij  change  is  governed  by  (13.1).  Gradually  and 
carefully  (3  is  increased  (the  system  is  “cooled”)  so  that  the  Monte-Carlo  process  tends 
back  to  strict  minimization.  (See  [23].) 

In  many  cases,  unfortunately,  the  global  minimum  is  likely  to  be  reached  only  if  /?  is 
increased  impractically  slowly,  requiring  exponential^  growing  computer  times,  or  else  the 
process  will  be  trapped  in  some  local  minimum  with  a  large  “attraction  basin”  (usually 
containing  smaller-scale  sub-basins  from  which  the  process  does  escape).  This  difficulty  is 
removed  by  multilevel  annealing,  based  on  the  following  principles: 

(i)  A  hierarchy  of  changes  is  selected.  In  two-dimensional  Ising  spin  lattices,  for 
example,  a  change  on  level  l  is  defined  as  the  simultaneous  flipping  (sign  reversal)  of  all 
the  spins  in  a  2*  x  2*  block,  (ii)  Each  coarse-level  change  is  decided  only  after  recursively 
calculating  its  effects  (i.e.,  minimizing  around  it)  at  all  finer  levels,  starting  from  the 
finest,  (iii)  At  each  level  a  specific  /?,  just  large  enough  to  escape  local  minima  on  that 
scale,  is  first  employed,  then,  still  at  that  level,  strict  minimization  (/?  =  oo)  follows,  (iv)  To 
offset  undesired  effects  of  stochasticity,  a  procedure  (called  LCC)  is  added,  separately  at 
each  level,  for  keeping  track  of  the  so- far  best  configuration,  by  updating  portions  of  it  by 
portions  from  the  evolving  configurations.  (See  [12].) 

These  principles  were  applied  to  difficult  two-dimensional  lattice  problems  with  N 
Ising  spins.  The  global  minimum  has  always  been  reached  in  0(N 3/2)  to  0(N 2)  computer 
operations.  The  parallel-processing  complexity  is  polynomial  in  log  N.  Similar  algorithms 
are  being  developed  for  the  traveling  salesman  problem.  (The  “statistical”  TSP  with  N 
cities  is  solved  in  0(N )  operations). 

The  above  principles  should  also  apply  in  many  problems  where  the  discrete-state 
nature  is  less  obvious.  Take  for  example  XY  spins  or  Heisenberg  spins,  where  each  u,- 
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is  a  2  or  3  dimensional  vector  of  length  1.  Although  each  U{  can  change  continuously, 
some  large-scale  topological  features  of  the  field  of  spins  (such  as  the  existence  of  closed 
curves  along  which  the  spins  gradually  rotate  a  full  circle)  can  only  change  discretely. 
Similar  situations  arise  in  x-ray  crystallography  and  protein  folding  problems.  Another 
example:  in  image  reconstruction,  each  unknown  tt j,  representing  the  grey  level  in  the  t'-th 
pixel,  can  be  considered  continuous,  but  nonlinear  constraints  that  should  be  added  to 
the  problem  (cf.  Sec.  11)  may  well  include  discrete  elements,  such  as  the  appearance  of 
an  “edge”.  In  each  of  these  cases  a  certain  combination  of  the  multilevel  annealing  with 
classical  multigrid  should  be  used.  More  generally,  coarse-level  annealing  should  apply 
in  any  minimization  problem  with  large-scale  local  minima,  and  multilevel  annealing  is 
required  whenever  a  hierarchy  of  attraction  basins  is  involved. 


14.  Statistical  Problems.  Multilevel  Monte-Carlo 


The  aim  in  statistical  physics  is  to  calculate  various  average  properties  of  configura¬ 
tions  governed  by  the  probability  distribution  (13.1).  This  is  usually  done  by  measuring 
those  averages  over  a  sequence  of  “Monte-Carlo  iterations”,  in  which  each  tq  in  its  turn  is 
randomly  changed  in  a  way  that  obeys  (13.1)  (using  e.g.  Metropolis  rule  [27]).  Unfortu¬ 
nately,  in  such  processes  statistical  equilibrium  is  usually  reached  very  slowly,  and,  more 
severely,  even  when  it  has  been  reached,  some  averages  are  still  very  slow  to  converge, 
especially  those  long-range  correlations  the  physicist  needs  most. 

These  troubles  and  others  may  be  cured  by  multilevel  Monte-Carlo  techniques,  in 
which  coarse-level  changes  (changing  the  solution  in  preassigned  blocks  in  preassigned 
patterns)  are  added  and  averaged  over.  In  some  problems  this  can  be  done  quite  straight¬ 
forwardly,  but  in  more  interesting  cases  “stochastic  coarsening”  procedures  should  be  em¬ 
ployed.  See  details  in  Appendix  B. 


15.  Linear  Programming  (LP) 


A  multilevel  approach,  called  iterative  aggregation,  has  been  developed  for  LP  prob¬ 
lems  (see  [16],  [31]),  especially  for  situations  in  which  the  planned  system  is  naturally 
divided  into  a  hierarchy  of  sectors  and  sub-sectors.  This  considerably  speeds  up  the  calcu¬ 
lations,  and  also  provides  the  manager  with  a  very  useful  hierarchical  view  of  the  system. 

For  very  large  systems,  to  obtain  the  typical  speed  of  multigrid  solvers,  more  refined 
aggregations  are  needed.  This  can  easily  be  done,  for  example,  in  problems  with  a  geo¬ 
metrical  basis  (cf.  Sec.  11),  such  as  the  LP  transportation  problem  (see  e.g.  [19]).  Recent 
tests  were  made  with  a  method  that  lumps  together  two  (or  so)  neighboring  destinations 
into  a  “block  destination”,  two  neighboring  blocks  into  a  super-block,  etc.  Shipping  costs 
to  a  block  are  determined  from  the  current  intra-block  marginal  costs.  It  turns  out  that 
a  1-FMG-like  algorithm  gets  very  close  (practically  obtains)  the  solution.  The  required 
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work  is  even  smaller  since  many  of  the  blocks  that  are  supplied  by  one  origin  need  no 
fine-level  processing.  Several  orders  of  magnitude  savings,  compared  to  simplex  solutions, 
were  indicated. 


16.  Historical  Note 


Various  multi-level  solution  processes  have  independently  occurred  to  many  investiga¬ 
tors  (see  partial  list  in  [5]).  The  earliest  we  know  is  Southwell’s  acceleration  of  relaxation 
by  “group  relaxation”  [30],  a  two- level  algorithm.  The  first  to  describe  a  recursive  proce¬ 
dure  with  more  than  two  levels  is  Fedorenko  [17].  Similar  approaches  were  early  introduced 
to  economic  planning  (see  Sec.  15).  All  these  early  works  lacked  full  understanding  of  the 
real  efficiency  that  can  be  obtained  by  multileveling,  and  how  to  obtain  it,  since  they  did 
not  regard  the  fine-grid  processes  as  strictly  local,  hence  thought  in  terms  of  too-crude  ag¬ 
gregations.  Fedorenko’s  estimates  of  the  work  involved  in  solving  simple  Poisson  equations 
are  off  by  a  factor  104,  for  example.  Fully  efficient  multigrid  algorithms,  based  on  local 
analysis,  were  first  developed  at  the  Weizmann  Institute  in  1970-1972  (see  [4]),  leading 
then  to  most  of  the  developments  reported  in  the  present  article. 


Appendices 

A.  Integral  Equations  and  Many-Body  Interactions 


When  an  integral  equation  of  the  general  type 

I  K(x,  y)u(y)dy  +  /(x, u(x))  =0,  x  e  Cl  C  Ed 

J  n 


(A.  1) 


is  discretized  in  a  usual  way  on  a  grid  with  n  =  0(h~d)  points,  the  unknowns  are  all 
connected  to  each  other;  the  matrix  of  the  (linearized)  discrete  system  is  full.  A  solution 
by  elimination  would  require  0{n 3)  operations.  An  FMG  solution  would  require  0(n2) 
operations,  since  each  relaxation  sweep  costs  0(n2)  operations.  Even  when  (A.l)  is  ill 
posed  (Fredholm  equation  of  the  first  kind),  the  FMG  solver  can  still  be  as  effective  (cf. 
Sec.  11).  In  case  (A.l)  is  nonlinear  in  u,  FAS-FMG  should  be  used,  still  retaining  the  same 
efficiency  (see  Sec.  7). 

Potentially,  however,  the  most  important  contribution  of  the  multilevel  approach  to 
the  solution  of  integral  equations  is  not  in  such  FMG  solvers  (in  fact,  for  second-kind  FYed- 
holm  equations,  0(n2)  solution  times  are  already  nearly  obtained  by  simple  relaxation), 
but  in  reducing  the  multi-integration  work  far  below  0(n2),  sometimes  to  0(n)  and  most 
often  to  0(n  log  n),  by  exploiting  smoothness  properties  of  K.  This  is  done  by  using  the 
FAS  structure  in  the  following  special  way  (first  presented  in  [8,  §8.6]). 
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The  discretization  of  (A.l)  on  grid  h  has  the  form 

n 

£  KiM  +  “*)  =  °-  (i  =  !.-.»)  (A.2) 

i=i 

where  t  and  j  Me  multi-indices,  Xj  =  ih,  and  tt^  =  u^(xt)  approximates  tt(xj).  Since 
K(x,y)  is  fully  known,  (A. 2)  is  essentially  obtained  by  performing  the  integration  in  (A.l) 
with  u(y)  being  replaced  by  values  polynomially  interpolated  from  the  grid  values  Uj. 
Hence  the  chosen  discretization  (e.g.,  the  order  of  the  polynomial  interpolations),  and  its 
truncation  errors,  depend  solely  on  the  smoothness  of  u,  not  of  K.  If  K(x,  y)  as  a  function 
of  y  is  much  smoother  than  u(y),  an  error  much  smaller  them  the  truncation  error  would 
be  introduced  when  Kty  is  replaced  by  K-j1  =  where  KfH  is  the  restriction 

(injection)  of  K±h  {Kfj*  as  a  function  of  j,  for  a  fixed  i)  to  the  coarse  grid  H,  and  /j)  is  an 
interpolation  from  H  to  h  of  a  sufficiently  high  order.  This  will  replace  each  summation  in 
(A.2)  by  52j  where  J  runs  over  the  coarse  grid  and  uH  =  (lfj)Tuh,  superscript 

T  denoting  adjoint  (matrix  transposition).  Hence,  choosing  iff  —  (i^)T  for  the  FAS 
fine- to- coarse  solution  averaging  (see  Sec.  7),  the  summation  is  done  on  the  coarse  grid. 
Moreover,  if  K(x,y)  is  also  thus  smooth  as  a  function  of  x,  each  Kff1  can  similarly  be 
replaced  by  interpolation  from  Kfj^ ,  so  that  those  coarse-grid  summations  will  actually 
be  calculated  only  for  coarse-grid  values  of  ».  If  K  is  sufficiently  smooth,  one  can  similarly 
replace  grid-Ff  summations  by  summations  on  still  coarser  grids,  etc.  All  this  can  easily 
be  incorporated  into  the  FAS-FMG  algorithm;  it  simply  requires  that,  at  each  level  h, 
Kjtj>uj  is  stored  along  with  u|*,  its  values  (on  finer  levels)  being  interpolated 
from  level  H  along  with  the  interpolation  of  vif  or  its  corrections. 

It  is  easy  to  see  that  if  the  order  of  smoothness  of  K  is  twice  that  of  u,  this  algorithm 
(with  being  interpolated  at  all  levels  upto  grid  hi  =  0(h 1/2))  will  solve  the  problem  to 
the  level  of  truncation  errors  in  0(n)  operations.  In  most  physical  problems  the  smoothness 
of  K(x,  y)  increases  indefinitely  with  increasing  distance  \x  —  y |.  In  such  cases  the  algorithm 
can  be  used  (all  the  way  to  the  coarsest  grid),  but  at  each  level  h  the  values  of  tr*  should 
be  corrected  (after  being  interpolated  from  w^)  by  summation  over  some  m  points  on  grid 
h  in  the  vicinity  of  X{.  For  example,  for  potential-type  equations  (i.e.,  K(x,  y)  =  log  |x  -  y| 
or  K(x,y)  =  |x  —  y|_1)  the  algorithm  should  be  used  with  m  =  O(logn),  and  the  order  of 
Iff  should  also  be  O(log  n),  resulting  in  0(n  log  n)  solution  time. 

This  algorithm  can  in  the  same  way  be  used  even  when  the  given  grid  is  non-uniform; 
e.g.,  when  the  given  grid  points  represent  an  actual  self  gravitating  set  of  point  masses.  To 
facilitate  high  order  interpolations,  the  best  way  in  this  case  may  be  to  organize  the  coarser 
grids  in  a  semi-uniform  structure,  based  on  a  collection  of  progressively  finer  uniform  grids 
defined  over  increasingly  more  specialized  subdomains  (cf.  Sec.  7)  so  that  they  evenly 
resolve  the  given  non-uniform  grid.  Any  many-body  interaction  can  very  effectively  be 
computed  by  this  tool,  including  the  recovery  of  velocities  from  vortices  in  fluid  vortex 
methods  and  the  updating  of  potential  fields  in  moving  bodies  simulations. 

Recently,  an  algorithm  which  solves  potential- type  equations  in  O(n(logn)3)  opera¬ 
tions  has  been  presented  [33].  It  seems  to  be  substantially  slower  than  the  above  algorithm, 
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less  general  and  more  complicated. 


B.  Multilevel  Monte-Caxlo 

Suppose  a  grid  function  u  has  the  boltzmann  distribution  (13.1),  and  the  task  is  to 
calculate  averages  (M(u))  =  P(u)M(u),  for  various  functionals  M(u).  Assume  first 

that  each  uj,  the  value  of  u  at  gridpoint  Xj  =  jh  =  (j\, . . .  ,jj)h,  is  a  real  number. 

When  0  — >  oo  (zero  temperature),  the  task  boils  down  to  finding  the  “ground  state(s)”, 
i.e.,  the  configuration(s)  for  which  min  E(u)  is  attained.  This  problem  is  solved  very 
effectively  by  the  usual  multigrid  algorithm,  that  for  simplicity  can  be  described  as  the 
alternating  use  of  the  following  two  steps,  (i)  Gauss-Seidel  relaxation:  The  gridpoints  are 
scanned  in  some  prescribed  order;  at  each  point  Xj  in  its  turn,  the  value  of  uj  is  changed  so 
as  to  minimize  E  as  far  as  possible.  This  process  by  itself  can  in  many  cases  converge  to  the 
desired  minimum,  but  the  convergence  will  normally  be  slow,  because  smooth  errors  will  be 
reduced  very  slowly,  (ii)  Coarse- grid  correction  is  a  correction  of  a  given  approximation  u 
by  a  function  of  the  form  where  v”  is  a  function  defined  on  a  coarser  grid  (e.g.,  with 

meshsize  H  =  2 h),  and  denotes  coarse-to-fine  (H  to  h)  interpolation,  whose  weights 
in  any  direction  should  generally  reflect  the  strength  of  interactions  in  that  direction.  vH 
itself  is  selected  so  as  to  minimize  E(u  +  I^v11).  To  (approximately)  calculate  vH ,  the 
resulting  coarse-grid  minimization  problem  is  itself  (approximately)  solved  by  using  again 
steps  (i)  and  (ii).  (Thus,  a  sequence  of  increasingly  coarser  grids  is  in  fact  recursively 
used.)  Since  vH  very  well  approximates  smooth  errors,  the  overall  process  converges  very 
fast.  (See  Secs.  3,  4,  5.  In  case  local  minima  are  obtained  instead  of  global  ones,  elements 
of  multilevel  annealing  should  be  incorporated;  cf.  Sec.  13.) 

For  finite  0,  relaxation  is  replaced  by  a  Monte-Carlo  process:  at  each  xj  in  its  turn,  a 
new  value  of  uj  is  randomly  chosen  according  to  the  probability  distribution  (13.1)  (given 
that  all  other  components  of  u  are  fixed  at  their  current  value).  When  this  is  done  many 
times  over,  a  sequence  of  configurations  is  generated  with  “ detailed  balance",  i.e.,  with 
the  property  that,  if  at  a  certain  stage  in  that  sequence  an  equilibrium  has  been  reached 
(meaning  that  the  probability  to  have  obtained  any  configuration  u  is  the  physical  prob¬ 
ability  P(u)),  then  that  will  also  be  true  in  all  subsequent  stages,  making  the  subsequent 
sequence  representative  enough  for  calculating  the  desired  averages.  In  practice,  equili¬ 
bration  is  slow:  equilibrium  is  (approximately)  obtained  only  after  many  steps,  because 
large-scale  (i.e.,  smooth)  deviations  are  slow  to  disappear.  More  seriously,  even  when  equi¬ 
librium  has  been  reached,  the  calculated  averages  are  often  very  slow  to  converge  and  very 
expensive,  because  of  the  following  four  difficulties.  (A)  “ Critical  slowing- down"  (typically 
occuring  near  critical  temperatures,  which  are  physically  most  important):  the  space  of 
configurations  is  slowly  sampled,  because  large-scale  solution  features  are  slow  to  change. 
(B)  Slow  balancing:  Deviations  at  sill  scales  are  slowly  averaged  out.  If  a  standard  de¬ 
viation  er  is  contributed  by  the  features  of  some  scale,  these  features  have  to  completely 
change  0((<7/e)2)  times  in  order  to  obtain  accuracy  e.  (C)  Domain  vastness.  Large  scale 
features,  physically  very  important  (especially  close  to  critical  temperatures),  obviously 
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require  very  large  grids  to  be  simulated.  (D)  Fermionic  interaction.  In  QCD  problems,  at 
each  Monte-Carlo  iteration,  to  account  for  the  dynamics  of  fermions,  a  system  of  lattice 
Dirac  equations  should  be  solved.  Moreover,  the  change  in  the  logarithm  of  the  determi¬ 
nant  of  that  system  should  in  fact  be  calculated,  possibly  requiring  enormous  amount  of 
calculations. 

These  four  difficulties  actually  multiply  each  other,  and  therefore  usually  result  in 
intractable  calculations.  Fortunately,  they  can  all  simultaneously  be  overcome  by  multi¬ 
levelling. 

We  first  describe  how  to  eliminate  the  slow  equilibration  and  the  critical  slowing.  The 
first  simple  approach,  in  a  straightforward  analogy  with  the  zero-temperature  case,  is  to 
alternate  the  usual  Monte-Carlo  process  with  a  coarse-grid  Monte-Carlo ,  in  which  only 
changes  of  the  form  IffV^  axe  considered  to  a  given  fine-grid  configuration  u.  Detailed 
balance  is  preserved  if  the  Hamiltonian  (energy)  governing  this  coarse  Monte-Carlo  is 
Eh(vh)  =  E(u  +  I^vH).  By  some  pre-calculation  this  Hamiltonian  can  be  rewritten  in 
a  simple  form,  quite  similar  to  the  given  (i.e.,  the  fine-grid)  Hamiltonian.  (Using  the  FAS 
formulation,  such  a  coarse-grid  Hamiltonian  can  quite  generally  be  devised,  even  when  E 
is  not  quadratic;  see  [12,  §7.1].  It  is  interesting  to  note  that  with  this  formulation,  external 
fields  can  generally  be  viewed  as  defect-corrections  of  some  finer  structures  to  a  coarser 
physics.)  In  some  cases  (e.g.,  when  large  local  deviations  are  improbable),  this  coarse 
Monte-Carlo  well  represents  all  moves  which  are  slow  to  equilibrate  in  the  usual  (fine-grid) 
Monte-Carlo.  In  such  cases,  a  two-level  cycle,  composed  of  a  couple  of  fine-grid  sweeps 
followed  by  coarse-grid  equilibration  followed  by  an  additional  couple  of  fine-grid  sweeps, 
will  nearly  equilibrate  the  fine-grid  configuration.  The  coarse-grid  equilibration  itself  can 
rapidly  be  (nearly)  obtained  by  similarly  alternating  between  sweeps  on  that  grid  and 
(near)  equilibration  on  a  still  coarser  grid.  This  recursively  yields  a  multigrid  cycle.  Since 
coarser  sweeps  are  computationally  much  cheaper,  the  total  work  in  such  a  cycle  is  only 
a  fraction  more  than  the  work  invested  in  the  fine-grid  sweeps.  Thus,  equilibrium  (and 
hence  also  decorrelation)  is  nearly  obtained  in  a  work  equivalent  to  just  few  Monte-Carlo 
sweeps.  (This  has  been  demonstrated  by  Goodman  and  Sokal  [34].) 

In  most  cases  of  interest,  as  Murphy  would  predict,  this  straightforward  approach  will 
not  quite  work,  mainly  because  the  probable  slow-to-equilibrate  moves  cannot  generally 
be  characterized  as  having  the  form  For  example,  this  is  obviously  the  case  for 

Ising  spins.  To  be  sure,  coarse-level  moves  of  Ising  spins  ore  feasible:  they  would  typically 
consist  of  the  simultaneous  flipping  of  bxb  squares  (or  bxbxb  cubes);  and  x  b(  squares 
at  the  £-th  level  of  coarsening.  But  such  plain  square  flips  will  most  often  increase  the 
energy  very  much  (the  more  so  the  coarser  the  level)  and  will  therefore  most  probably  be 
rejected  by  the  Monte-Carlo  process.  To  employ  probable  coarse  moves,  the  blocks  being 
flipped  should  tend  to  be  broken  along  “weak  links”,  i.e.,  at  interactions  which  currently 
carry  high  energy  (at  violated  bonds,  in  case  of  Ising  spins).  To  obtain  such  blocks  and 
still  maintain  detailed  balance,  we  propose  to  employ  the  following  stochastic  coarsening 
process. 

Take  first,  for  example,  the  d-dimensional  Ising  spin  model,  with  variables  u<  =  ±1, 
where  the  sites  are  i  =  (*i, . . .  ,»<*),  1  <  ta  <  na  (for  convenience  na  will  usually  be  a 
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power  of  2),  and  with  the  Hamiltonian 

■E(u)  =  -  53  ( B .i) 

where  J,j  ^  0  only  for  “neighboring”  (in  whatever  sense)  t  and  j.  (Usually  periodicity  is 
assumed,  where  ia  =  na  and  ta  =  1  are  considered  neighbors.)  The  blocking  described 
above  (e.g.,  with  6  =  2)  can  be  interpreted  as  introducing  “coarse  spins”  which  are  seated 
at  the  “coarse-grid”  sites,  e.g.,  the  sites  i  for  which  all  ta  are  even.  Each  coarse  spin 
represents  a  block  of  (fine-grid)  spins,  including  the  spin  occupying  the  same  site  and  some 
of  its  neighbors.  Unlike  the  above  pre-assigned  blocks,  however,  the  neighbors  blocked 
together  with  each  coarse  spin  will  be  determined  in  the  following  stochastic  way. 

Consider  the  candidate  blocking  of  two  spins,  and  xtj  say.  This  blocking  would 
mean  that  in  the  coarse-level  Monte-Carlo  moves  the  two  spins  are  flipped  simultaneously, 
hence  the  interaction  M  u)  =  JijU{Uj  is  frozen  —  it  does  not  change  throughout  those 
moves.  In  the  stochastic  coarsening  process  we  only  assign  a  probability  1  —  Pij  for  this 
freezing,  while  a  probability  Pij  is  assigned  to  simply  deleting  this  interaction  from  the 
Hamiltonian  governing  the  coarse  MC  moves,  without  blocking  u,-  and  uj  together.  It  is 
easy  to  see  that  detailed  balance  is  maintained  provided  (i)  whatever  is  obtained,  a  freeze 
or  a  deletion,  it  is  maintained  for  the  same  MC  moves  (e.g.,  for  the  entire  coarse  MC); 
(ii)  the  probability  used  is  Pij  =  qij  exp(— /3Vjj(u)),  where  u  is  the  current  configuratin 
and  qij  is  any  non-negative  constant  (independent  of  it,  but  depending  on  the  interaction), 
sufficiently  small  to  assure  that  Pij  <  1  for  any  u. 

One  by  one,  in  any  convenient  order,  we  scan  the  fine-grid  interactions  and  “kill”  them, 
i.e.,  stochastically  freeze  (through  spin  blocking)  or  delete  them.  It  is  easy  to  calculate  the 
new  interactions  between  the  formed  blocks.  These  new  interactions  in  turn  are  also  killed, 
except  for  those  formed  between  coarse  spins  (i.e.,  between  blocks  which  include  coarse-grid 
sites),  which  are  kept  “alive”.  At  the  end  there  remain  only  coarse  spins  and  a  sequence  of 
“independent”  blocks,  i.e.,  blocks  without  interactions.  Most  blocks  are  most  likely  small. 
(This  is  a  simplified  description.  Actually,  some  one-dimensional  sets  of  independent-block 
interactions  could  be  left  alive,  and  other  coarsening  patterns  could  be  used.) 

The  dynamics  of  the  independent  blocks  is  trivial,  their  statistics  easily  calculated. 
The  other  blocks,  the  coarse  spins,  interact  through  a  Hamiltonia  which  has  the  same 
general  form  (B.l)  as  before,  except  that  u,-  now  represents  the  coarse  spin  at  site  2 i,  and 
na  now  has  half  its  former  value.  The  sense  of  “neighborhood”  is  now  slightly  different, 
too,  but  most  likely  at  most  sites  it  is  still  very  local  (on  the  scale  of  the  coarse  grid).  Hence 
the  entire  process  can  be  repeated,  creating  in  the  same  way  increasingly  coarser  levels, 
each  level  consisting  of  a  grid  of  spins  (each  representing  a  block  of  next-finer-level  spins) 
and  a  list  of  independent  blocks  (including  as  a  sublist  the  next-finer-level  independent 
blocks). 

The  entire  algorithm  can  be  described  as  a  sequence  of  multigrid  cycles  for  the  finest 
level,  where  a  multigrid  cycle  for  any  given  level  (“the  fine  level”)  is  recursively  defined 
as  consiting  of  the  following  5  steps,  (j)  A  couple  of  MC  sweeps  are  first  made  on  the 
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fine  level,  to  settle  to  a  local  equilibrium,  (ii)  The  next  coarser  level  (“the  coarse  level") 
is  created  from  the  fine  level  by  the  above  stochastic  coarsening  process,  (iii)  7  multigrid 
cycles  for  the  coarse  level  are  performed,  (iv)  Each  coarse  spin  whose  final  value  is  different 
from  its  initial  value  is  translated  into  flipping  the  corresponding  block  of  fine-grid  spins, 
and  each  independent  block  of  fine-grid  spins  is  flipped  in  probability  1/2.  The  coarser 
level  (its  blocks,  Hamiltonian,  etc.)  can  now  be  erased,  (v)  Some  additional  MC  sweeps 
can  finally  be  made  on  the  fine  level. 

If  the  maximal  value  is  chosen  for  q^,  namely  qij  =  min^  exp(/0Vy(u)),  then  deletion 
is  sure  to  occur  at  each  violated  bond  (i.e.,  wherever  w  <  0).  Hence,  each  block 
will  consist  of  identical  spins.  Islands  of  one  sign  in  a  sea  of  the  other  sign  will  be  blocked 
separately  and  therefore  easily  disappear  in  the  Monte- Carlo  process  on  a  sufficiently  coarse 
level.  Also,  new  islands  will  easily  appear.  Most  of  the  configuration  will  change  in  one 
cycle.  The  work  in  a  cycle  is  dominated  by  the  couple  of  finest-grid  Monte-Carlo  passes. 
Thus,  decorrelation  is  obtained  in  a  work  equivalent  to  just  few  MC  sweeps. 

A  similar  process  can  easily  be  devised  for  any  discrete-state  or  continuous-state  sys¬ 
tem.  The  Hamiltonian  is  first  written  in  the  form  E(u)  =  — E jVj(u),  where  the  straight¬ 
forward  coarse-to-fine  interpolation  Ijj  would  be  equivalent  to  freezing  some  of  the  in¬ 
teractions  Vj.  But  instead  of  straightforward  freezing,  each  such  interaction  is  frozen  in 
probability  1  —  Pj  and  deleted  in  probability  Pj  =  qj  exp(— /3Vj(u)) .  where  the  best  value 
for  qj  is  perhaps  the  largest  allowed,  i.e.,  qj  =  min„  exp(/9V)(u)).  Then  the  interpolation 
is  modified  accordingly:  its  weights  are  larger  in  directions  of  stronger  remaining  inter¬ 
actions,  keeping  of  coarse  frozen  all  those  interactions  that  escaped  deletion.  (The  original 
Iff  itself  should  best  be  based  on  the  strength  of  the  original  interactions.  The  exact  form 
of  Iff  is  a  main  item  of  research  required  for  each  new  model  of  basic  interactions.)  It 
is  easy  to  see  that  detailed  balance  is  maintained,  and  that  (with  proper  choices  of  Iff) 
probable  coarse- grid  moves  are  created. 

Eliminating  the  critical  slowing  still  leaves  very  significant  slow  balancing.  Even  at 
high  temperatures,  Ising  spins  should  be  flipped  1010  times  in  order  to  get  five-digit  accu¬ 
racy  in  the  average  magnetization.  At  sufficiently  high  (or  sufficiently  low)  temperatures, 
this  slowness  can  be  eliminated  by  changing  the  way  statistics  is  extracted  from  the  se¬ 
quence  of  configurations;  by  replacing,  for  example,  each  Ising  spin  with  its  expected  value 
given  its  neighborhood  (thus  regarding  the  Monte- Carlo  sequence  as  a  process  for  cre¬ 
ating  detail-balanced  neighborhoods).  Such  balancing  of  deviations  is  still  very  effective 
even  at  quite  moderate  temperatures,  if  those  neighborhoods  are  suitably  enlarged.  But 
close  to  the  critical  temperature,  similar  balancing  needs  to  be  done  at  all  scales.  This 
can  naturally  be  done  with  the  multilevel  Monte-Carlo  outlined  above:  the  deviations  in 
the  neighborhood  of  a  spin  are  themselves  balanced  in  terms  of  the  coarser  level,  and  so 
on  recursively.  (This  may  more  conveniently  be  done  with  a  different  pattern  of  stochas¬ 
tic  coarsening.  Thus,  generally,  coarsening  patterns  should  be  chosen  accordingly  to  the 
purpose  of  calculations.) 

The  above  multilevelling  can  also  be  used  to  deal  with  vast  domains :  the  latter  should 
be  simulated  only  on  coarse  levels.  The  coarser  the  level,  the  larger  its  domain.  This  can 
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be  achieved  in  various  ways,  depending  on  the  nature  of  the  problem  and  the  desired 
statistics.  One  simple  way  is  still  to  use  the  finest  grid  over  the  entire  domain,  but  with 
only  few  passes  being  made  on  that  grid,  since  every  such  pass  produces  many  fine-grid 
local  samples,  as  against  perhaps  only  one  coarsest-level  sample  produced  by  a  pass  on 
the  coarsest  level.  A  multigrid  cycle,  as  described  above,  with  7  =  2d,  for  example, 
would  produce  comparable  number  of  samples  at  all  scales.  If  still  larger  domains  are 
needed  without  more  samples  at  the  finest  level,  one  can  extend  that  level  to  the  larger 
domain  (using  the  periodicity  -  assuming  periodic  boundary  conditions  have  been  used), 
then  create  from  it  the  next  coarser  level  by  the  stochastic  coarsening  process  and  delete 
the  finest  level  from  subsequent  processing.  (In  some  problems  the  finest  level  will  still 
subsequently  be  processed  at  some  special  zones,  e.g.,  near  boundaries.)  The  domain  may 
latter  similarly  be  extended  again  and  again,  deleting  each  time  the  currently  finest  level. 
Arbitrarily  large  scales  can  be  reached  this  way,  including  macroscopic  dynamics. 

Alternatively,  in  the  tradition  of  group  renormalization  techniques,  instead  of  execut¬ 
ing  many  coarsening  steps,  one  can  pass  to  the  limit  from  the  behavior  at  just  few  such 
steps.  Unlike  the  unbounded  number  of  interactions-per-spin,  and  the  unbounded  types  of 
interactions,  required  by  group  renormalizations,  the  multilevel  Monte  Carlo  limit  behav¬ 
ior  can  directly  be  derived  from  simple  interactions  (e.g.,  local  and  of  the  form  (B.l),  on 
all  levels).  A  critical  /?,  for  example,  can  be  determined  from  the  behavior  of  the  average 
interaction  as  a  function  of  the  coarsening  level.  Furthermore,  much  of  the  search  for 
critical  /?  can  be  confined  to  coarse-level  processing. 

The  solution  of  the  lattice  Dirac  equations,  with  associated  matrix  Q,  and  the  calcula¬ 
tion  of  6  log  det  Q,  is  rapidly  obtained  by  multilevel  solvers  (see  Secs.  1 1  and  12).  Moreover, 
these  solvers  can  very  nicely  collaborate  with  the  multilevel  Monte-Carlo:  they  yield  fine- 
to-coarse  defect  corrections  to  (Q-1),j  for  neighboring  t  and  j,  allowing  S  log  det  Q  to  be 
followed  also  during  coarse-level  changes,  without  calculations  on  finer  levels. 

Philosophically  speaking,  the  multilevel  approach  outlined  above  can  fulfill  one  of 
the  ultimate  goals  of  computational  physics:  the  computational  derivation  of  macroscopic 
dynamics  from  microscopic  laws,  wherever  this  cannot  be  done  analytically.  The  best 
schemes  for  observing  macrodynamics  will  probably  be  FAS-like  (cf.  Sec.  7),  in  which  each 
coarse-grid  variable,  instead  of  representing  a  correction  to  be  interpolated  to  the  next 
finer  grid,  actually  represents  a  sum  of  that  correction  and  a  local  average  of  some  next- 
finer-grid  values.  In  the  above  simple  case  of  Ising  spins,  for  example,  FAS  formulation 
means  that  the  initial  value  of  a  coarse  spin  is  decided  by  the  current  configuration  in 
the  fine-grid  block  it  represents,  e.g.,  by  a  majority  rule.  On  increasingly  coarser  levels, 
the  spin  dynamics  thus  created,  monitored  by  any  statistics  of  interest,  will  tend  to  some 
macroscopic  dynamics.  Generally,  in  this  way,  sophisticated  dynamics  of  superstructures 
can  be  performed  and  observed  at  coarse  levels,  for  negligible  work,  maintaining  detailed 
statistical  balance.  Such  techniques  are  therefore  obvious  candidates  for  treating  turbulent 
flows,  too. 

Needless  to  say,  the  multilevel  Monte-Carlo  is  highly  parallelizable.  With  enough 
processors,  solution  times  will  be  small-order  polynomial  in  the  number  of  levels  employed, 
which  is  logarithmic  in  the  number  of  variables. 
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Historical  note.  A  special  case  of  the  above  stochastic  coarsening  process  is  the  “percolation 
clustering”,  introduced  into  the  Potts  model  by  Kasteleyn  and  Fortum  [35],  [36],  and  used 
for  accelerating  Monte-Carlo  simulations  by  Sweeny  [37]  and  Swendsen  and  Wang  [38]. 
They  developed  it  for  a  special  type  of  interactions  —  the  explicit  interactions  between 
two  discrete-state  particles  (thus  excluding  the  many-particle  interactions  that  are  typically 
frozen  by  continuous-state  interpolations).  More  importantly,  their  stochastic  switching- 
off/freezing  process  was  carried  out  simultaneously  throughout  the  lattice,  so  multilevel 
processes  were  not  structured,  hence  acceleration  was  only  partial  (not  encompassing  all 
scales)  and  limited  to  special  situations  (where  percolation  occurs). 


C.  High  Efficiency  Fluid  Dynamics  Solvers 


The  treatment  of  steady  state  Navier-Stokes  and  Euler  equations  by  multigrid  methods 
is  becoming  increasingly  popular;  dozens  of  papers  appear  each  year;  see  for  example 
this  proceedings.  In  most  cases  these  works  superpose  multigrid  structures  on  previously 
developed  approaches  to  discretization  and  relaxation.  The  obtained  solvers  show  remark¬ 
able  improvements  in  solutions  times  over  previous  one-grid  algorithms,  but  are  still  far 
from  realizing  the  full  potential  of  multilevelling,  whether  in  terms  of  speed  or  in  terms  of 
obtained  accuracy. 

It  is  our  general  experience  with  multigrid  algorithms  that  any  PDE  problem  should 
be  solvable,  to  0(h 2)  accuracy,  in  just  a  few  (between  5  and  12  or  so)  “minimal  work  units” , 
where  the  minimal  work  unit  is  the  amount  of  computer  operations  needed  to  express  the 
simplest  discretization  of  the  problem  on  a  uniform  cartesian  grid  with  meshsize  h.  (The 
commonly  used  “relaxation  work  unit”,  i.e.,  the  work  in  one  relaxation  sweep,  is  often 
one  or  two  orders  of  magnitudes  larger  than  this  minimal  unit,  especially  when  complex 
discretizations  (coordinate  transformations,  flux  splitting,  Riemann  solvers,  Runge-Kutta 
multiple  steps,  etc.)  and/or  multi-direction  relaxation  (alternating  direction  line,  symmet¬ 
ric  Gauss-Seidel,  etc.,  especially  in  3D)  are  used). 

That  ideal  efficiency  can  be  obtained  for  steady-state  fluid  dynamics  as  well,  provided 
the  following  principles  (i)-(viii)  are  observed.  Note  that  each  principle  contributes  a 
substantial  independent  factor  of  efficiency,  sometimes  even  an  order  of  magnitude  or 
more,  so  in  combination  they  can  make  a  big  difference  indeed. 

(i)  Whenever  applicable,  the  steady-state  problem  should  be  solved  directly,  not  as 
a  limit  of  time  or  pseudo-time  evolution.  For  this  purpose  the  problem  should  be  well- 
defined  by  regarding  it  els  a  limit  of  an  elliptic  system  (see  [7,  §20.1]).  This  is  almost 
automatically  done,  for  no  additional  processing,  by  the  usual  FMG  algorithm,  in  which 
artificial  viscosities  and  similar  parameters  are  gradually  being  reduced  as  the  grids  become 
finer  (cf.  Sec.  9). 

(ii)  Employ  finite  difference,  rather  than  finite  element,  approximations  on  uniform 
cartesian  grids.  Local  refinements  can  be  obtained  by  adding  finer  local  uniform  grids, 
which  may  each  use  a  local  coordinate  system  to  fit  local  curves  (boundaries,  interfaces, 
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streamlines,  strong  discontinuities;  note  that  many  of  these  are  solution-dependent,  hence 
cannot  be  fitted  in  advance).  The  FMG  solver  can  naturally  incorporate  these  finer  levels, 
and  execute  self-adaptation  processes  as  well  [2],  [7,  §9].  Grid  adaptation  through  global 
coordinate  transformation  is  less  locally  adaptable,  more  expensive,  both  to  generate  and 
to  operate,  requires  much  more  complicated  and  expensive  smoothers  (cf.  Sec.  6)  and  much 
more  storage,  and  does  not  allow  grid  staggering. 

(iii)  Staggered  grids  axe  more  accurate  than  non-staggered  ones  and  can  therefore  use 
coarser  grids  (sometimes  twice  coarser)  to  obtain  the  same  accuracy.  Incidentally,  the 
multigrid  solver  itself  has  the  same  efficiency  whether  the  grid  is  staggered  or  not.  In  case 
of  Stokes  or  incompressible  Navier  Stokes  equations,  for  example,  a  central  h-elliptic  non- 
staggered  discretization  can  be  obtained  by  adding  an  “ artificial  pressure ”  term  —  ^3/i2A/*p 
to  the  continuity  equation,  where  1/24  <  /?  <  1/12.  The  DGS  smoothers  developed  for  the 
staggered  equations  [7,  §18.3]  is  applicable  to  this  non-staggered  scheme,  too,  and  yields 
the  same  soothing  factor. 

(iv)  Solve  directly  the  nonlinear  equations  by  using  FAS  (cf.  Sec.  7):  do  not  linearize. 
Even  in  relaxation  no  linearization  is  needed,  since  the  equations  are  quasi-linear.  This 
saves  processing  and  a  lot  of  storage. 

(v)  Employ,  especially  on  the  finest  levels,  the  highly-efficient  distributed  Gauss-Seidel 
(DGS)  smoothers  [7,  §3.7].  The  DGS  relaxation  sweep  typically  costs  at  most  1.5  minimal 
work  units,  and  its  smoothing  factors  (defined  as  in  [7,  §20.3.1])  are  between  .27  and  .4 
(see  details  in  [7,  §20.3.4]). 

(vi)  Use  1-FMG  algorithm  (see  Sec.  4),  and  incorporate  into  it  any  “outer”  process 
such  as  continuation,  grid  adaptation,  system  optimization,  etc.  (cf.  Sec.  9). 

(vii)  Treatment  of  non-ellipticity  should  follow  the  rules  in  Sec.  6  above. 

(viii)  New  discretization  schemes  Eire  often  very  desired  in  multigrid  solvers,  for  the 
following  reasons. 

First,  existing  discretization  methods  may  include  slight  defects,  which  may  consider¬ 
ably  damage  accuracy.  In  many  cases  these  defects  have  merely  gone  unnoticed  until  the 
introduction  of  multigrid  solvers,  whereas  the  latter  must  detect  them  in  their  convergence 
rates.  One  such  defect  typical  in  fluid  dynamics  is  that  the  artificial  viscosity  (introduced, 
e.g.,  via  upstreaming  or  flux  splitting)  is  not  always  quite  physical;  e.g.,  it  is  not  uniform 
and  isotropic  in  some  cases  (such  as  flow  separation)  where  its  isotrophy  and  uniformity 
are  important. 

Also,  existing  discretization  schemes  sire  plainly  much  too  complex.  Flux  splitting  and 
Riemann  solvers  for  steady-state  Euler  equations  can  be  replaced  by  the  much  cheaper  use 
of  artificial  viscosities  that  directly  imitate  the  main  viscosity  terms  in  the  Navier-Stokes 
equations  [7,  §20.2].  The  exact  size  and  direction  of  the  artificial  viscosities  can  be  selected 
to  avoid  straddling  strong  discontinuities  and  to  allow  significant  interior  influence  only  to 
the  right  boundary  conditions.  The  rigorous  theory  attached  to  Flux  splitting  schemes  is 
irrelevant  anyway,  since  it  applies  to  the  initial-value  one-space-dimension  problem,  not  to 
the  steady-state  multi-dimensional  one.  In  fact,  those  schemes  fail  at  low  Mach  numbers. 
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at  oblique  shocks  and  in  some  separated  flows.  They  are  also  inextensible  in  case  one 
needs  to  add  a  little  physical  viscosity  somewhere  in  the  flow.  Moreover,  substantially 
more  accurate  discretizations  (especially  for  characteristic  components)  are  obtainable  by 
finite  difference  stencils  which  allow  the  use  of  diagonal  neighbors,  unavailable  when  the 
discretization  is  split. 

Finally,  it  is  important  to  realize  that  the  multigrid  process  offers  nets  possibilities 
in  discretization:  see  Sec.  8.  The  “double  discretization”  scheme  has  proved  particularly 
effective  in  our  fluid  dynamics  codes,  allowing  the  best  combination  of  second-order  ac¬ 
curacy,  stability,  correct  shock  direction  and  minimal  shock  smearing.  (The  main  point 
to  note  when  applying  the  double  discretization  scheme  in  the  presence  of  shocks  is  that 
large  residuals  of  opposing  signs,  appearing  near  a  shock,  should  be  cancelled  against  each 
other  before  being  transferred  to  the  coarser  grid). 

The  basic  schemes  we  have  developed  for  compressible  and  incompressible  Navier 
Stokes  and  Euler  equations  are  described  in  detail  in  [7,  §§18-20].  For  “cooked”  known 
smooth  solutions  of  the  differential  equations,  at  arbitrary  Reynolds  and  Mach  numbers, 
algebraic  errors  smaller  than  the  0(h2)  truncation  errors  were  obtained  by  1-FMG  solvers, 
employing  W(l,  1)  or  1V(2.0)  cycles,  costing  up  to  10  minimal  work  units.  As  for  problems 
with  discontinuities,  we  have  so  far  developed  and  tested  our  approach  only  for  simplified 
scalar  equations,  such  as  -eAu  +  (u2)*  +  «y  =  /,  where  e  is  positive  but  arbitrarily  small. 
The  lessons  we  have  learned  should  now  be  translated  to  Euler  equations. 

Summary.  Multigrid  techniques  best  operate  when  they  are  used  as  a  total  approach: 
not  just  a  fast  solver,  but  also  a  tool  for  superior  discretizations,  grid  transformations  and 
local  refinements;  a  way  of  avoiding  linearizations  and  meaningless  (far  below  truncation 
errors)  algebraic  convergence;  etc.  The  superposing  of  multigrid  schemes  on  a  variety  of 
other  methods  yields  far  less  efficient,  and  also  considerably  more  complicated,  codes.  On 
the  other  hand,  of  course,  to  adopt  a  totally  new  approach  is  initially  difficult;  it  requires 
new  research  and  development. 
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ABSTRACT.  The  purpose  of  this  paper  is  to  describe  the  performance  of  a 
multigrid  method  implemented  on  a  hypercube  multiprocessor  architecture. 
The  basic  aim  is  to  show  that  multigrid  can  take  very  effective  advantage 
of  such  architectures.  In  fact,  we  will  demonstrate  that  the  parallelize 
ability  of  multigrid  may  be  limited  only  by  that  of  relaxation  itself.  We 
do  this  by  showing  that  multigrid  can  be  designed  so  that  its  interpro¬ 
cessor  communication  can  be  completely  accounted  for  during  relaxation  so 
that,  for  practical  applications,  the  cost  of  the  coarser  levels  is 
negligible.  (Loss  of  parallelism  in  multigrid  occurs  certainly  where 
there  are  fewer  points  than  processors.  We  show  that  such  losses  really 
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have  negligible  impact  on  performance.)  This  demonstration  uses  a 
two-dimensional  red-black  multigrid  fast  Poisson  solver  implemented  in 
both  FORTRAN  and  C  on  an  Intel  iPSC  32-node  hypercube. 


1.  INTRODUCTION 

There  have  been  many  important  studies  done  with  multilevel  algorithms  in 
the  context  of  multiprocessor  computers.  (See  [1-34,  36-40].)  It  seems 
natural  to  consider  the  impact  of  the  availability  of  many  processors  on 
the  efficiency  of  multigrid  techniques.  The  present  paper  will  further 
this  study  in  two  essential  ways.  First,  we  will  show  in  contrast  to  the 
other  studies  that  the  interprocessor  communication  requirements  of 
multigrid  can  be  fully  taken  care  of  during  relaxation.  This  is  important 
by  itself,  but  it  has  special  significance  when  multigrid  solvers  are  used 
in  multilevel  adaptive  processing.  In  fact,  in  [18]  we  show  that  with 
multigrid  used  as  the  basic  "inner  loop"  grid  solver,  there  are  no 
additional  interprocessor  communication  requirements  for  AFAC  (the 
asynchronous  fast  adaptive  composite  grid  method).  Second,  we  will 
investigate  the  concern  that  multigrid  must  lose  parallel  efficiency 
because  its  coarser  grids  have  fewer  points  than  there  are  available 
processors.  We  will  show  that,  for  real  applications,  this  concern  is 
essentially  unfounded. 

We  focus  our  study  on  a  two-dimensional  red-black  multigrid  fast 
Poisson  solver  implemented  in  both  FORTRAN  and  C  on  a  32-node  Intel  iPSC 
hypercube.  Because  of  the  simplicity  of  this  study  and  the  existing 
literature  relevant  to  this  topic,  we  will  keep  our  discussion  to  the  bare 
essentials . 


2.  BASIC  SCHEME 

Our  model  problem  is  the  Poisson  equation 

-4u  =  (  in  O  -  [0,  l]2 
u  =  g.  on  30. 


(2.1) 


The  ingredients  of  the  basic  multigrid  scheme  include  the  usual  5-point 
finite  difference  discretization  of  (2.1)  on  a  uniform  grid.  It  uses 
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red-black  relaxation,  half-injection,  bilinear  interpolation 
points  only),  and  standard  V(l,  1),  V(2,  1),  FMV(1,  1)  or 
cycling.  See  [35]  for  a  discussion  of  this  scheme. 


(to  black 
FMV( 2 ,  1) 


3.  PARALLEL  ENVIRONMENT 

We  assume  that  the  reader  is  familiar  with  hypercubes  in  general,  the 
Intel  iPSC  in  particular,  gray  codes,  and  their  various  known  properties 
in  relation  to  multigrid.  It  is  especially  important,  that  hypercubes 
support  nearest  neighbor  mesh  arrays  and  that  coarser  processor  meshes 
have  nearest  neighbor  connections  exactly  two  hypercube  path  lengths  away. 
See  [5,  6,  18,  29,  37]  for  further  discussions. 


4.  PROCESSOR  ASSIGNMENTS 

We  make  the  usual  assignment,  of  processors  to  the  work  load  by  domain 
decomposition.  More  precisely,  we  assign  a  rectangular  mesh  array  of 
processors  to  the  finest  grid  (which  we  assume  is  a  uniform  square  grid  on 
12)  in  the  natural  way  so  that  each  processor  contains  a  rectangular 
subgrid,  all  of  which  are  approximately  the  same  size  (see  Figure  1). 


Figure  1 

Processor  Grid  Assignment 

Here  we  assign  a  16-node  hypercube  to  the  7><7 
grid  in  Q  *  [0,  1  ] 2 .  Shown  are  the  subgrids 
assigned  to  the  (1,1),  (2,2)  and  (3,3)  processors 
on  the  4x4  mesh  array.  The  dotted  lines  indicate 
subgrid  interior  boundaries. 
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Note  that  there  is  some  flexibility  in  such  iissignments  with  the  two 
extreme  cases  being  "squares"  (as  shown  in  Figure  1  where  the  subgrids  are 
virtually  square)  and  "strips"  (where  two  opposing  boundaries  of  each 
subgrid  coincide  with  aft).  For  simplicity,  we  will  refer  to  these 
collectively  as  "boxes". 

Each  box  consists  of  interior  points  and  boundary  points.  We  assume 
that  the  interior  points  correspond  to  true  variables  in  the  sense  that 
the  assigned  processor  will  try  to  determine  «  at  those  points.  The 
boundary  of  each  box  either  coincides  with  aft,  so  the  values  of  «  are 
preassigned  there,  or  else  it  represents  an  "interior  boundary",  so  it 
coincides  with  interior  points  of  neighboring  boxes. 

We  cite  a  few  important  properties  of  this  processor  assignment 
scheme . 

i)  The  nearest  grid  point  neighbors  of  a  given  interior  point  of  a 
box  are  contained  in  the  interior  points  of  this  box  and  its 
nearest  neighbor  subgrids.  (Because  we  use  5-point 

discretizations  here,  this  means  that  stencils  reach  into 
nearest  neighbor  boxes,  not  to  diagonal  neighbors  Actually,  as 
the  next  property  shows,  this  structure  easily  supports  9-point 
stencils .  ) 

ii)  Updating  an  approximation  «  at  interior  boundary  points  can  be 
done  in  two  synchronized  message  passing  steps.  For  example, 
all  processors  can  first  send  values  of  u  at  their  respective 
northern-most  and  southern-most  interior  lines  of  grid  points  to 
their  immediate  northern  and  southern  processor  neighbors.  This 
wouid  then  be  repeated  in  the  eastern  and  western  directions. 
As  we  shall  see,  it  is  important  that,  corner  interior  points  be 
passed  to  the  diagonal  processor  neighbors.  Thus,  the  value  of 
u  at  the  (2,  2)  interior  grid  point  in  Figure  1  should  be  passed 
from  P  to  P  But.  note  that,  after  the  northern  pass,  P 

1  l  1  6  ,  fc  i  ,  <- 

contains  the  updated  value  «2  2 .  Thus,  this  value  can  be  easily 
updated  in  by  way  of  the  eastern  pass.  This  is  the  motive 

for  performing  these  passes  sequentially.  (For  9-point 

discretization  schemes,  this  synchronization  needs  to  be  part  of 
each  communication  phase;  for  5- point  schemes,  communication 
need  only  be  synchronized  after  the  last  relaxation  on  each  each 
level .  ) 
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iii)  It  is  now  well  known  (cf.  [6])  that  this  processor  assignment  to 
grid  points  essentially  preserves  nearest- neighbor  properties  on 

coarser  multigrid  levels.  More  precisely,  let  n  be  the  natural 

h  k 

subgrid  of  the  finest  grid  12  with  mesh  size  H  =  2  h,  where  k  is 

a  positive  integer.  Suppose  we  maintain  the  same  processor 

assignment  in  the  sense  that,  if  P  is  the  processor  assigned  to 

a  given  box  b'1  in  12*\  then  it  is  also  assigned  to  the  box  B^  in 

H  H 

ft  with  the  property  that  the  interior  points  of  B  are 

contained  in  B^1 .  Then  the  nearest  domain  neighbors  of  any  point 
|| 

p  in  B  are  at  most  two  processor  path  lengths  away.  That  is, 

U 

the  nearest  neighbor  of  p  is  either  in  B  ,  or  it  is  in  a  box 

assigned  to  a  processor  that  is  at  most  two  message  path  lengths 
away  from  p  in  the  hypercube  architecture.  Note  that  the 

interior  boundary  points  of  boxes  will  tend  to  drift  into  the 
domain  of  boxes  that  are  farther  away  (see  Figure  2)  and  that 
boxes  will  tend  to  disappear  (i.e.,  lose  all  interior  points)  as 
the  multigrid  levels  become  coarser. 


5.  C -LEVEL 

One  of  the  major  implementation  concerns  is  that  the  coarser  multigrid 
levels  have  significantly  fewer  points  than  there  are  processors.  For 

example,  ip.  Figure  1  it  can  be  seen  that,  on  levels  h  -  -i  and  h  =  -i,  all 
but  the  boxes  at  the  northern  and  eastern  boundaries  are  nonempty;  but,  on 

level  h  =  only  the  (2,  2)  box  is  nonempty.  We  will  say  that  the 

coarsest  level  where  essentially  full  concurrency  is  maintained  to  be  at 

C-level  .  Thus,  level  h  -I  is  at  C-level  and  level  h  -  i  is  below 
C-level . 


There  are  four  basic  approaches  to  treating  this  concern: 


i ) 

V  - cycles . 

The  easiest  approach  is 

simply  not.  to 

go  be 

C  level . 

This  means  that  the 

coarsest 

grid  would  be 

just 

one  where 

all  boxes,  except 

those  a 

long  the  north 

and  e 

boundaries,  are  nonempty.  (This  gives  the  V-cycle  the 
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appearance  of  a  U;  hence,  the  name.)  Note  that  this  would 
degrade  performance  if  C-level  occurred  at  a  small  h. 

ii)  Shared  Multiarid  Solver.  Another  fairly  simple  approach  is  to 

have  one  processor  (or  possibly  a  group  of  processors)  obtain 
all  data  from  the  others,  complete  the  below  C-level  portion  of 
the  U-cycle,  then  return  its  results  to  the  respective 
processors  for  completion  of  the  full  V-cycle.  Note  that  this 
requires  two  global  communications.  To  eliminate  the  second 
one,  so  long  as  the  processor  communication  structure  allows  it, 
each  processor  could  obtain  the  necessary  global  information 
from  all  others  and  complete  the  below  C-level  portion  itself, 

iii)  Another  Solver.  of  course,  the  C-level  equations  could  be 

solved  by  another  method.  The  simplest  is  relaxation,  which 
requires  only  that  additional  sweeps  be  taken  on  the  coarsest 
grid  in  the  U-cycle.  Again,  performance  of  multigrid  would 
degrade  here  if  this  level  is  too  fine, 

iv)  Sleeping  Nodes  .  Another  approach  is  to  have  all  processors 

participate  to  the  extent  necessary  to  complete  the  V-cycle  in 
the  straightforward  global  fashion.  This  would  mean  that 
processors  would  begin  to  go  idle,  or  to  sleep,  as  the  levels 
were  coarsened  and  to  be  awakened  on  the  return  to  higher 
levels . 

We  will  not  be  concerned  with  the  relative  merits  of  these 
approaches,  but  rather  restrict  our  attention  to  the  development  of  the 
last  alternative  and  the  study  of  the  effects  of  processors  going  idle  on 
the  coarser  grid.  We  do  this  by  way  of  developing  U-cycles  first. 


6.  U  CYCLES 

The  major  task  in  implementing  a  parallel  U-cycle  code  is  the  preparation 
of  a  good  scalar  code.  We  started  with  a  standard  red-black  5-point 
stencil  scheme.  The  only  major  preparation  necessary  was  to  allow  the 
code  to  handle  boundary  drifts.  Specifically,  as  we  noted  in  Section  4. 
boundaries  of  the  boxes  tend  to  drift  out  farther  into  the  domain  fi  as  the 
grids  get  coarsened  (see  Figure  2).  This  means  that  the  scalar  code  must 
be  prepared  to  handle  the  case  that  successively  finer  grids  cover 
increasingly  smaller  domains. 
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Figure  2 

Illustration  of  boundary  drift  for  the  (1,1)  box. 
The  large  X  is  the  coarse  grid  point  belonging  to 
the  larger  box. 


Once  the  scalar  code  has  been  carefully  prepared,  it  becomes  fairly 
easy  to  modify  it  for  implementation  on  a  hypercube.  We  assume  the  use  of 
a  host  for  receiving  input  and  sending  the  necessary  data  to  wake  up  and 
initiate  the  nodes.  Then  the  node  code  would  be  developed  from  the  scalar 
code  first  by  replacing  its  initialization  process  with  a  posted  receive 
from  the  host  followed  by  a  routine  for  determining  to  which  box  it  is 
assigned  and  other  initial  data  characterized  by  the  received  message. 

The  next  modification  needed  for  the  scalar  code  is  to  account  for 
the  communication  of  internal  boundaries  between  processors.  A  simple 
observation  that  seems  to  have  been  missed  by  others  is  that  this 
Interprocessor  communication  is  required  only  in  relaxation .  Thus,  after 
each  red  or  black  half  sweep,  the  processor  code  must  send  its  updated 
values  of  the  approximation  at  grid  points  neighboring  each  internal 
boundary  to  the  respective  processor  neighbors  (see  (ii)  of  Section  4). 
This  is  enough  to  ensure  that  each  processor  has  correct  data  to  allow  all 
other  routines  (e.g.,  residual  calculation  and  intergrid  transfer)  to 
execute  without  further  communication.  Because  we  are  never  below 
C-level,  this  means  that  these  neighbors  are  just  the  nearest  ones  in  the 
mesh  array. 

The  final  step  is  to  provide  communication  back  to  the  host  for 
output  of  intermediate  and  final  error  estimates  and  the  final 
approximation . 
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7 .  V-CYCLES 

Once  the  U-cycle  code  has  been  developed,  modifying  It  to  produce  a  true 
V-cycle  using  sleeping  nodes  Is  a  fairly  complicated  task.  The  three 
major  areas  of  modification  are: 

i)  Initialization  .  At  the  outset,  each  processor  must  determine 
the  level  at  which  it  goes  to  sleep  (i.e.,  at  which  its  box 
becomes  empty)  and  what  its  processor  neighbors  are  on  each 
level.  Also,  each  must  determine  the  coloring  order  of  its  grid 
points.  (The  southwestern  grid  point  of  each  box  above  C-level 
is  always  red,  but  at  and  below  C-level  it  may  be  black.  This 
must  be  determined  for  correct  processing  by  each  processor.) 
These  tasks  involve  a  substantial  amount  of  modular  arithmetic, 
but  are  otherwise  fairly  straightforward, 

ii)  Putting  Processors  to  Sleep.  During  the  f ine-to-coarse  phase  of 
the  V-cycle,  processors  must  put  themselves  to  sleep  on  the 
level  that  they  become  empty.  This  can  easily  be  achieved  by 
such  processors  posting  receives  of  messages  that  will  come  from 
their  neighbors  only  when  they  are  to  be  awakened, 

iii)  Waking  Processors .  During  the  coarse-to-f ine  phase  of  the 
V-cycle,  processors  must  be  awake  at  the  level  that  their  boxes 
become  nonempty.  The  basic  idea  here  is  that  the  sleeping 

processors  are  waiting  for  messages  to  be  received  from  the 

processors  that  are  thereby  going  to  wake  them. 

We  have  taken  two  different  approaches  to  waking  processors.  For  the 
FORTRAN  code  we  developed,  during  the  coarse-to-f ine  phase  of  the  V-cycle 
processors  communicate  to  their  proper  neighbors  after  each  half  sweep. 
Below  C-level  and  just  before  proceeding  to  a  finer  level,  each  processor 
checks  to  see  who  its  neighbors  will  be  there.  If  they  are  different, 

then  they  must  be  processors  who  need  awakening,  so  they  are  each  sent 

messages  that  contain  current  updates  for  their  boundary  data.  These 
newly  awakened  nodes  in  turn  send  their  data  to  two  of  their  neighbors 
that  must  also  be  awakened  (see  Figure  3). 

The  C-code  approach  is  a  little  more  involved  but  generally  more 
efficient.  The  basic  idea  is  that  the  final  message  passing  sequence  in 
relaxation  is  modified  so  that  the  messages  (sent  to  nearest  neighbors  on 
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Figure  3 

Here  we  illustrate  active  processors  waking 
their  sleeping  neighbors  ©  .  We  illustrate  the 
five  message  passing  phases  with  a  few  examples. 


a  given  level)  go  through  the  nodes  that  are  to  be  awakened  for  the  next 
finer  level.  First  the  active  nodes  wake  their  sleeping  neighbors  to  the 
west.  Calling  these  the  newly  awakened  nodes,  they  then  act  as  conduits 
of  the  boundary  data  between  east  and  west  active  neighbors.  The  newly 
awakened  nodes  then  wake  up  their  north  and  south  "late"  sleepers,  while 
the  active  nodes  repeat  the  process  in  the  north  and  south  directions. 
This  is  all  accomplished  in  five  message  passing  phases  (see  Figure  3): 

1.  Active  nodes  pass  west. 

2.  Newly  awakened  nodes  share  data  with  their  western  active  neighbors. 

3.  Newly  awakened  nodes  pass  east  while  active  nodes  pass  north. 

4.  Newly  awakened  nodes  pass  north  to  awake  the  late  sleepers;  active 
nodes  pass  south  to  their  now  newly  awakened  neighbors. 

5.  All  newly  awakened  neighbors  pass  south. 


8 .  FMV-CYCLES 

There  is  principally  no  real  difficulty  in  converting  the  V-cycle  code  to 
incorporate  full  multigrid.  The  only  significant  difference  here  is  that 
most  nodes  are  asleep  to  begin  with;  but  since  FMV  is  based  on  V-cycles, 
all  other  aspects  of  the  code  are  essentially  the  same. 
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9 .  TIMING  RESULTS 

We  ran  various  timing  tests  on  the  two  codes  we  developed  using  a 
dedicated  32-node  Intel  iPSC.  We  report  here  only  the  C  code  because  its 
node  waking  and  boundary  treatment  methods  make  it  more  efficient  below 
C-level.  At  higher  levels,  the  C  code  is  actually  slightly  slower  because 
the  Intel  compilers  generally  produce  executable  code  for  FORTRAN  that  is 
somewhat  faster  than  C  for  large  applications. 

Tables  I,  II  and  III  display  the  results  of  these  timing  tests  for 
V(l,  1)  using  strips,  V(l,  1)  using  squares,  and  FMV(2,  1)  using  strips, 
respectively.  By  the  term  "square"  we  really  mean  that  the  boxes  are 

assigned  in  a  regular  way  such  that  they  are  as  square  as  possible.  Thus, 
for  a  32-node  assignment,  the  decomposition  is  4  x  8.  The  decomposition 
for  strips  is  1  x  32. 

The  tables  depict  the  total  execution  time,  t,  in  milliseconds  for 
various  values  of  problem  size,  m,  and  cube  size,  d.  Thus,  entries 

d  in  n 

correspond  to  a  subcube  of  2  nodes  applied  to  the  (2  -  1)  by  (2  -  1) 

finest  grid.  In  parentheses  we  include  estimates  of  the  percentages  of 

time  that  the  code  spent  in  communication,  p,  and  below  C-level,  q.  p  was 

determined  simply  be  measuring  the  effect  of  commenting  out  the 

communication  statements  in  the  code.  (This  produces  incorrect  numerical 

results,  of  course.)  q  was  determined  not  by  direct  timings,  but  simply 

by  comparing  the  times  for  a  coarse  grid  version  of  the  cycling  scheme  on 

the  next  smaller  processor.  For  V-cycles,  we  simply  compared  the 

(m-1,  d-1)  entry  for  t  with  the  (m,  d)  entry.  For  FMV-cycles,  we  used  the 

time  of  an  FMV-cycle  for  (m-1,  d-1)  with  an  extra  V-cycle  on  each  level. 

To  take  into  account  the  increased  cost  of  sending  messages  two  path 

lengths  away,  we  adjusted  each  of  these  figures  upward  by  15*.  This 

percentage  was  determined  by  timing  the  differences  between  one  and  two 

path  length  messages  and  by  accounting  for  the  percentage  of  time  that  the 

codes  spend  in  communication. 

Finally,  Table  IV  contains  typical  results  of  communication  costs  (in 
milliseconds)  for  FMV(2,  1)  using  strips. 

We  now  make  several  comments  and  observations: 

i)  Sections  of  the  tables  are  completely  missing  either  because  we 
could  not  easily  solve  such  large  problems  (upper  entries)  or 
because  the  corresponding  problems  started  off  below  C-level. 


(%  time  in  below  C-level,  %  time  in  communication) 
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Table  II.  t(p,  q)  for  V(l,  1)  Squares 
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d 

5 

4 

3 

2 

1 

0 

m 

10 

i 

5,252 

_ 

_ 

— 

9 

3,676 

3,537 

3,439 

— 

— 

— 

8 

2,575 

2,464 

2,287 

1,761 

691 

— 

7 

1,794 

1,686 

1,557 

1,227 

410 

116 

6 

1,202 

1,118 

1,011 

809 

287 

82 

5 

777 

668 

i 

659 

511 

193 

56 

— 

405 

348 

293 

140 

34 

3 

— 

— 

151 

120 

51 

18 

2 

— 

— 

— 

33 

51 

6 

1 

— 

— 

— 

— 

2 

0 

Table  IV.  FMV(2,  1)  Strips: 
Communication  costs  In  milliseconds 


li)  C-level  occurs  here  at  m  =  d  for  strips  and  at  m  =  ^  for 

squares.  ( [x]  Is  the  ceiling  integer  function.)  Thus,  C-level 
rises  every  step  from  left  to  right  in  Tables  I  and  III  and 
every  other  in  Table  II;  and  C-level  is  much  lower  in  Table  II. 
Moreover,  the  effect  of  sleeping  processors  is  more  dramatic  on 
strip  decompositions  as  is  evidenced  by  the  higher  percentages, 
q,  in  Tables  I  and  III. 

iii)  When  the  finest  grid  is  at  or  just  above  C-level,  both  the 
V-cycle  and  the  FMV-cycle  spend  a  substantial  portion  of  their 
time  below  C-level,  where  processors  are  often  asleep.  But  just 
a  few  levels  above  this  the  picture  is  quite  different.  This 
gives  some  limited  evidence  to  the  claim  that,  except  for  very 
small  problems  (relative  to  the  size  of  the  cube),  multigrid 
parallelism  does  not  degrade  because  of  processor  idleness  on 
coarser  levels.  We  will  examine  this  further  in  the  next 
section. 

tv)  The  top  figures  (largest  values  of  m)  are  consistent  within  each 
table.  More  precisely,  it  can  be  seen  from  each  of  these  tables 
that,  well  above  C-level.  the  cost  of  the  respective  cycles 
depends  on  the  number  of  points  per  processor,  and  little  else. 
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v)  An  interesting  feature  we  observed  from  these  tests  is  that  the 
communication  costs  for  a  given  cycle  and  fixed  m  grew  very 
slowly  with  increasing  d.  This  is  evident  from  Table  4.  This 
suggests  that  increasing  the  number  of  processors  to  tackle  a 
given  problem  will  result  in  only  marginal  loss  of  efficiency  in 
terms  of  communication. 


10.  THUMBNAIL  COMPLEXITY  ESTIMATES 

The  results  of  the  last  section  suggest  that  coarse  grid  processor 
idleness  is  not  a  serious  problem  for  large-scale  multigrid  applications. 
Yet  these  results  are  for  fairly  small  hypercubes  (d  <  5).  To  get  a 
picture  of  much  larger  scale,  in  this  section  we  make  rough  estimates  of 
the  parallel  complexity  of  general  V-cycles.  We  do  this  for  arbitrary  d 
and  m  in  the  general  D-dimensions  using  generalized  strips  and  squares 

(slabs  and  cubes  in  D  =  3  dimensions). 

We  make  the  following  simplifying  assumptions: 

i)  The  dimension  of  the  cube  is  d. 

ii)  The  grid  contains  essentially  2m  points  in  each  of  the  D 

coordinate  directions. 

iii)  The  time  that  the  V-cycle  spends  in  arithmetic  on  a  given  level 

is  cn,  where  n  is  the  maximum  number  of  points  per  processor  for 
that  level. 

iv)  The  time  that  the  V-cycle  spends  in  communication  on  a  given 
level  is 

(a  +  P<  at  or  above  C-level 
2 (a  +  p t)  below  C-level, 

where  f  is  the  total  length  of  all  messages.  (a  reflects  the 
start-up  cost  of  sending  a  message  while  p  reflects  the 
dependence  of  message  passing  cost  on  message  length.  Suitable 
values  for  a  and  p  depend  on  the  number  of  communication  stages, 
the  character  of  the  machine,  and  other  details  of  the 
environment . ) 

Table  V  contains  cost  components  for  at  and  above  C-level  and  for 
below  C-level.  This  done  for  both  generalized  strips  and  squares.  The 
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generalized 

strips 


generalized 

squares 


cost  at  and  above  C-level _  cost  below  C-level 


_2d(°-1)  D(m-d+l)_ 

2°-l 

2D-l(2(D-l)M-l)_l)c 

+  (D-.X-dM) 

+  2°-VD-1)(d-|)-l)„ 

+  (m-d+ne 

+  2 (d— 1 ) 3 

1  (2D(m-d/D+l)_1)c 

2°-l 

(f- Do 

+2"<°'1><2<D-1><"-d+1>-l)a 

r  -i 

+  2D(|  -  l)o 

+  2  (m-d/EH-l )  6 

+  2D(y  -  1)6 
a 

Table  V. 

Complexity  of  V-cycle,  broken  down  by  cost  at  and  above 
C-level  and  below  C-level,  for  generalized  strips  and 
squares . 


generalized 

strips 


generalized 

squares 


c  a  6 


PD 

°(T> 

o 

/-x 

^  o 

Table  VI. 

Order  of  the  ratios  of  the  coefficients  in  Table  V 
where  P  =  2^  and  N  =  ^  . 


results  are  displayed  in  order  to  facilitate  comparison  of  the  components 
for  arithmetic  and  communication  costs.  Table  VI  depicts  the  order  of  the 
ratio  of  these  cost,  coefficients  in  terms  of  the  number  of  processors  (P  = 

2°* )  and  the  number  of  points  per  processor  on  the  finest  grid  (N  =  2mD  d). 
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The  complexity  advantages  of  generalized  squares  for  very  large-scale 
applications  is  apparent.  Also,  this  suggests  that,  as  long  as  the 
capacity  of  individual  nodes  grow  with  increasing  dimension  of  the 
hypercube  (and  this  growth  can  be  fairly  slow  for  squares),  then  processor 
idleness  will  not  be  a  severe  problem  for  multigrid  schemes. 


11.  CONCLUDING  REMARKS 

Our  present  approach  was  motivated  by  the  concern  that  idle  processors  on 
coarser  grids  can  severely  limit  multigrid  efficiency.  We  believe  that 
our  work  shows  that  this  concern  is  essentially  unfounded.  In  fact,  we 
have  arrived  at  the  somewhat  stronger  conclusion  that  the  time  multigrid 
spends  while  processors  are  idle  is  negligible.  That  is,  it  is 
unnecessary  to  try  to  achieve  a  speed-up  of  p  on  coarser  grids  because 
their  cost  is  relatively  insignificant.  This  is  true  for  varying  ranges 
of  c,  a  and  p .  In  other  words,  despite  the  communication-arithmetic 
properties  of  the  hypercube,  processor  idleness  is  of  no  concern  for  real 
appl ications . 

Our  arguments  here  are  based  on  the  premise  that  advanced  machines 
are  meant  to  solve  very  large-scale  problems.  By  this  we  mean  both  that 
problems  will  tend  to  tax  them  (i.e.,  N  is  on  the  order  of  machine 
capacity)  and  the  machines  have  powerful  nodes  (i.e.,  machine  capacity  is 
large  compared  to  the  number  of  processors).  However,  it  may  be  argued 
that  multigrid  is  swamped  by  idle  processors  for  very  fine  grained 
problems  (i.e.,  where  N  is  order  unity).  Even  here,  however,  the  question 
is  not  so  much  processor  idleness  as  overall  speed,  and  multigrid  remains 
effective  in  that  respect. 
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We  apply  the  multigrid  method  to  the  problem  of  computing  quark 
propagators  in  lattice  Quantum  Chromodynamics.  Renormalization  group 
considerations  lead  to  a  modification  of  the  scaling  of  parameters  such  as 
the  bare  quark  mass  in  the  Dirac  operator  when  it  is  projected  from  a 
fine  lattice  onto  a  coarse  lattice.  Extensions  of  the  multigrid  methods  to 
the  update  algorithm  for  the  Monte  Carlo  gauge  configuration  and  the 
fermionic  determinant  are  also  suggested. 

1.  INTRODUCTION 

Lattice  computations  for  quantum  field  theories  near  the  continuum  limit  and  critical 
phenomena  in  statistical  mechanics  are  impeded  by  critical  slowing  down.  The  problem  is 
that  most  of  the  iterative  or  relaxation  methods  involve  local  changes  that  propagate  at  a 
fixed  velocity  (in  iteration  time)  across  the  lattice.  Consequently,  as  the  correlation  length 
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grows,  a  larger  volume  of  the  lattice  must  relax  and  the  convergence  rate  degrades.  In  the 
context  of  numerical  analysis  of  partial  differential  equations,  “multigrid"  methods  have 
been  developed  to  cope  with  just  this  situation.  [1]  Similarly,  in  the  context  of  quantum 
field  theories  a  related  set  of  techniques  refered  to  as  “renormalization  group”  methods 
have  offered  useful  insights  into  critical  phenomena  and  lead  to  some  computational  tools 
for  dealing  with  them.  [2-3] 

This  paper  combines  techniques  from  multigrid  and  renomalization  group  methods  and 
applies  them  to  the  computation  of  the  propagators  for  fermions  in  Quantum  Chromody¬ 
namics  (QCD).  We  also  outline  some  applications  of  these  ideas  to  problems  of  computing 
gauge  configurations  and  internal  fermion  corrections.  In  the  case  of  the  fermion  propaga¬ 
tor  of  QCD,  the  critical  slowing  down  is  a  result  of  taking  the  quark  mass  toward  zero  in 
order  to  reach  the  physically  small  pion  mass.  In  fact,  in  the  simulations  performed  to  date 
large  errors  are  introduced  by  having  to  extrapolate  in  the  quark  mass  beyond  the  region 
where  the  convergence  rate  is  acceptable.  Since  the  fermion  propagator  is  the  solution  of 
a  first  order  linear  differential  equation,  this  is  a  natural  place  to  apply  the  conventional 
multigrid  analysis  for  PDE’s.  On  the  other  hand,  there  are  some  new  features  introduced 
both  by  the  need  to  preserve  gauge  invariance  and  our  ability  to  rely  on  the  renormaliza¬ 
tion  group  to  improve  scaling  for  the  bare  parameters.  Understanding  these  aspects  will 
help  to  extend  the  multigrid  method  to  other  aspects  of  the  full  simulation  of  QCD. 

Our  first  computational  goal  is  to  exhibit  the  features  of  the  multigrid  analysis  for 
gauged  PDE’s  and  to  compare  the  acceleration  in  convergence  relative  to  the  very  inter¬ 
esting  Fourier  acceleration  technique,  [4]  and  next  to  develop  multigrid  method  for  the 
gauge  field  updates,  as  has  been  recommended  by  several  authors.  [5] 


2.  MULTIGRID  FOR  WILSON  LATTICE  FERMIONS 

Computations  in  lattice  QCD  [6-8]  have  two  major  elements:  Generating  the  repre¬ 
sentative  samples  of  gauge  configurations  by  Monte  Carlo  sampling  and  the  inclusion  of 
the  effects  of  the  fermions  (quarks)  by  solving  the  Dirac  equation.  In  applying  the  multi¬ 
grid  method  to  Wilson’s  formulation  of  lattice  QCD,  [9]  we  shall  assume  in  our  present 
analysis  that  the  appropriate  gauge  fields  t/M  (z)  =  exp  ( igAp  (z))  have  been  generated  by 
some  kind  of  Monte  Carlo  simulation  with  the  correct  probability  weight  proportional  to 
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exp  (— S gauge /g2).  Thus  we  are  left  with  the  remaining  problem  of  solving  a  linear  partial 
differential  equation  for  the  Dirac  equation,  or  more  precisely  a  finite  difference  version  of 
the  Dirac  equation, 

£X<*« *)*(*>  =  *(*)•  (2-1) 

v 

where  the  matrix  L  depends  on  the  background  gauge  fields  Uf ,(x)  in  a  manner  to  be 
made  precise  shortly.  L  is  a  finite  difference  matrix  on  a  4 D  hypercubic  lattice  with 
sites  x,y  =  a (nj, nj, T13, 714),  where  a  is  the  lattice  spacing  and  n,-  =  1,2,. ..Ni.  (We 
take  the  number  of  sites  to  be  k  powers  of  2  in  all  directions,  Ni  =  2k,  with  periodic 
boundary  conditions.  Throughout  we  specify  these  details  to  be  precise  about  our  current 
simulations,  realizing  that  they  are  often  immaterial  to  the  general  multigrid  techniques 
being  developed.) 

The  fermion  Green’s  function  is  the  solution  G(x,0)  =  'i'(z),  with  a  point  source 

b  (z)  =  6  (x  —  0).  In  Eq.  (2.1)  we  have  suppressed  the  indices  for  the  spin  (»,  j  =  1,2, 3, 4) 

and  color  (a,/?  =  1,2,3).  In  greater  detail  this  equation  for  Wilson  fermions  is, 

E  h  (7$ + RSij) u?0  {x)  *?  (•+**)-("+ *?  (*)  =  (*) .  (2.2) 

The  matrix  (x,t/)  is  sparse  in  the  space- time  lattice  sites,  with  contributions  coming 
only  from  the  nearest  neighbor  sites.  The  outgoing  links  from  site  x  are  labled  by  the  8 
lattice  vectors  p  =  a(±i,±2,±3, ±4),  as  shown  in  Figure  1.  Each  positively  oriented 
link  carries  non-sparse  matrices  (7**  +  R)  for  spin  and  Up  (x)  for  QCD  colour,  with  the 
convention  that  for  links  in  the  negative  direction  the  corresponding  matrices  are  given  by 

7-#*  = -7m,  U-p  (x)  =  Ul  (x  -  p) .  (2.3) 

In  general,  we  will  fix  R  =  1,  and  speak  in  terms  of  convergence  as  a  function  of  one 
bare  parameter,  the  quark  mass  M.  Typically,  standard  iterative  techniques  applied  to  this 
linear  problem  converge  well  for  quark  masses  a  (AT  —  M„)  >  .04  ,  for  g  =  1  (0qcd  =  6), 
with  Mcr  —  —0.8031.  This  requires  substantial  extrapolation  to  the  physical  value  of 
a(M  —  Mcr )  —  .0025,  which  is  an  order  of  magnitude  closer  to  the  critcal  point.  [10] 

Multigrid  methods  require  transfering  the  differential  equation  Eq.  (2.2)  to  a  new 
coarser  grid  with  spacing  a!  =  A  a.  Again  for  clarity  we  consider  the  one  case  a '  =  2a. 
The  coarse  grain  sites  will  be  designated  by  primes,  x',y>  =  a'  (ni,nj, 713,714),  where  ni  — 
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FIG.  1.  Fine  grain  lattice  sites  (o  and  x)  with  spacing  a,  and  the  four  orthogonal 
lattice  vectors  /xn  =  ah.  The  coarse  grain  lattice  has  spacing  a'  —  2a,  sites  designated  by 
X,  and  lattice  vectors  2/T„. 


L&s(x')  —  rs(x') 


FIG.  2.  Two  level  multigrid  cycle.  Primes  denote  coarse  lattice  variables,  and 
hats  denote  coarse  lattice  functions. 
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l,2,3,..lVx/2  and  functions  on  this  grid  will  be  given  hats  (e.g.  L^l  (a1)  =  b(x')).  The 
new  fermion  PDE  on  the  a!  lattice  is, 

~  (V  +  #)  (*')  *  (*'  +  2 H)  ~  Z*  (m'  +  #  (*')  =  b  (x1)  .  (2.4) 

A  two  level  multigrid  cycle,  indicated  in  Figure  2,  has  four  steps  to  update  9*  — »  $*+1: 

1.  j/,  iterations  on  the  fine  lattice  to  get  a  smoothed  approximate  solution  ’f*  to 
Eq.  (2.2). 

2.  Projection  of  the  residual  r*  (x)  =  b  (x)  —  L$,  onto  the  coarse  lattice  f*  (x')  = 
P(x',x)rJ(x); 

3.  u't  iterations  of  Le‘  (x1)  =  f*  (x1)  on  the  coarse  lattice  to  find  an  approximate 
solution  for  the  error  e*; 

4.  Interpolation  of  the  error  e‘  (x)  =  Q  (x,x')  e*  (xr)  to  obtain  the  new  approxi¬ 
mate  solution  $ *+1  (z)  =  (*)  -f  e*  (x). 

The  generalization  of  this  two  level  algorithm  to  a  recursive  mutli-level  algoritm  is 
straight  forward. 

Now  we  need  to  discuss  in  detail  the  construction  of  the  restriction  matrix  P  (x,  x'),  the 
coarse  grain  linear  operator  L,  and  the  interpolation  matrix  Q  (x',  x).  These  must  respect 
the  lattice  symmetries,  most  importantly  for  our  problem  gauge  invariance  and  the  proper 
conformal  scaling  relations  as  a  ->  o’. 


3.  GAUGE  AND  CONFORMAL  SYMMETRY  CONSTRAINTS 

The  hypercubic  symmetry  group  and  the  o'  =  2a  translations  are  rather  trivial  to 
preserve  at  each  stage.  Symmetric  operations  on  the  elementary  cells  guarantee  this  aspect. 
More  interesting  is  the  ease  with  which  gauge  invariance  is  satisfied.  At  each  site  we  have 
the  local  symmetry  '£  (x)  — ►  (x)  and  UM  (x)  — »  SXUM  (x)  for  any  SU  (3)  rotation 

Sx.  We  require  that  the  subset  of  symmetries  Sxi  be  preserved  for  the  new  $  (x1)  and 
Uj p  (x1)  on  the  coarse  grain  lattice.  For  the  projection  operator  this  is  accomplished  by 
bringing  the  points  from  x  to  x'  by  the  parallel  transport  rule  of  an  ordered  product  of 
gauge  factors  on  the  path  from  x  to  x'.  Together  with  the  hypercubic  symmetries  this 
results  in  an  ansatz  that 

,  ,  Zp  Wm  (*')■■•  0„  (>  +  rt  V„  (*)) 

- 


(3.1) 
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FIG.  3a.  Paths  of  length  1  =  1  and  2  for  gauge  invariant  projection  matrix  P(x',x), 
from  fine  sites  x  (o)  into  coarse  sites  x'  (x). 


FIG.  3b.  Paths  in  the  gauge  invariant  interpolation  operator  Q  (x,  x')  from  *'  sites 

(x)  to  x  sites  (o). 
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The  product,  as  indicated  in  Figure  3a,  is  over  all  the  shortest  paths  into  x'  for  the  2a 
hypercube  centered  at  x'.  In  the  case  of  free  field  theory,  j*  -t  0  (hence  Up  is  gauge 
equivalent  to  Up  =  l),  we  must  have  Zp  =  1  +  0  (g1)  in  order  that  P(x',x)  takes  a 
constant  field  ¥  (*)  into  a  constant  field  (x1)  with  the  same  norm.  The  relative  weights 
v>i  are  arbitrary.  While  it  is  not  strictly  required  by  symmetry  considerations,  it  might  be 
reasonable  to  include  with  each  link  matrix  Up  the  corresponding  spin  factor  (7**  +  R)  / 2. 
The  underlying  principle  is  to  respect  the  smoothing  dynamics  by  mimicking  the  transport 
factors  of  the  PDE  on  the  fine  grain  lattice.  For  example  for  R  —  1,  these  factors  project 
to  zero  two  of  the  spin  components,  so  for  Wilson  fermions  it  is  sensible  to  respect  this 
veto  in  the  multigrid  projection  as  well. 

The  interpolation  matrix  can  be  taken  to  be  the  Hermitian  transpose  of  P,  up  to  an 
overall  constant, 

*  Zq  Epatfc.  (t (*)•••  Up,  (*'  +  **)  Up,  (*')) 

Q  (x’x }  = - ’ 

as  indicated  in  Figure  3b.  In  order  for  Q  (x,  x')  to  interpolate  a  constant  field  41  (x')  into 
the  same  constant  field  for  'Si  (x),  we  must  have  Zq  =  1  +  O  (g2)  and  toj  =  1/  (21)!!  +  O  (g2) . 
As  we  will  see  it  is  surprisingly  cheap  in  execution  time  to  compute  the  transformations 
for  f  —  Pr  and  e  =  Qe.  In  fact  the  time  required  for  this  step  is  equal  to  only  half  a 
single  conjugate  gradient  iteration  on  the  fine  lattice!  Exact  gauge  invariance  is  not  only 
maintained,  it  follows  directly  from  the  basic  parallel  transport  interpretation  needed  to 
gather  (in  P)  and  scatter  (in  Q)  the  lattice  points. 

Finally  we  must  consider  the  form  of  L  in  greater  detail.  If  the  problem  were  being 
solved  for  the  case  of  free  field  theory  (i.e.  no  background  gauge  fields,  Up  (x)  =  1  and 
gauge  coupling  g  =  0),  the  obvious  choice  is  to  represent  U,p  (*)  =  1  and  the  naive  scaling 
relation  for  the  mass  parameters  is  M'  =  M.  (Set  R  =  R!  =  1  for  now.)  Such  a  free  theory 
has  naive  dimensional  scaling.  With  non-trivial  gauge  fields,  the  basic  idea  of  multigrid 
(and  the  renormalization  group)  suggests  that  we  hold  fixed  the  longest  physical  correlation 
length  air,  or  equivalently  the  pion  mass  mT  =  1/air-  Thus  the  solution  to  L'Sl  =  b  with 
a  point  source  at  x  =  0,  obeys 

£>?|J~<7,rexp(-n/{.)  (3.3) 

with  n  being  the  distance  in  lattice  units  a.  Then  if  we  solve  the  same  problem  on  the 
coarse  lattice,  L’Sl  =  b,  and  lift  it  back  to  the  fine  lattice,  should  have  the  same  behavior, 


(3.2) 
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exp  (-«'/&)  (3.4) 

<*.* 

with  n'  being  the  distance  in  lattice  units  a 1  =  2a.  Hence  and  C*  =  C'x.  If  M 

were  the  only  length  scale  in  the  problem  M1  =  M  would  follow  automatically,  and  the 
product  of  “wave  function  Z  factors”  would  be  ZpZq/Z =  1.  In  feet  for  the  free  field 
theory,  we  have  taken  Zp  =  Zq  =  1,  so  that  P  and  Q  separately  map  constant  solutions 
into  constant  solutions  with  the  magnitude  unchanged. 

4.  RENORMALIZATION  GROUP  CONSIDERATIONS 

When  the  background  fields  are  non  zero  ( g 3  >  0)  there  is  a  new  length  scale  due  to  the 
correlation  length  of  the  gauge  field  itself.  For  example  the  string  tension  length  scale  a£, t 
is  the  inverse  of  the  square  root  of  the  coefficient  of  the  area  law  measured  by  the  Wilson 
loop.  Now  the  renormalization  group  instructs  us  to  change  the  bare  coupling  constant 
and  the  bare  quark  mass  parameter  in  order  to  keep  fixed  physical  quantities  such  as  the 
correlation  lengths,  masses  or  amplitudes  as  the  lattice  scale  a  becomes  a'.  In  differential 
form  the  Callan-Symanzik  equations  are, 

^  ^  Ig  ~  &  ^ 9 ’ dlf  ~  7)  X  (PhysicaI  Quantity)  =  0  (4.1) 

Naive  scaling  (i.e.  7  =  0  from  simple  dimensional  analysis,)  works  only  in  special  cases, 
like  free  field  theory.  However  in  the  continuum  limit  a!  — »  0,  we  do  have  scaling  laws  from 
the  fixed  point  of  at  g  =  0.  Thus  we  have 


a'—  =  ( g ,  M)  =  bog3  +  big 8  +  . . . 

a'~TT  ~  P M)  =  ^S3/*'  +  ciMg 2  +  ..., 


(4.2a) 

(4.2ft) 


where  the  coefficients  &o>ftl>co  and  cj  are  known  from  perturbation  theory.  [11]  The  solu¬ 
tions  give  improved  scaling  laws  for  g  (a')  and  M  (a',g(a')). 

Roughly,  the  result  is  a  slowly  varying  g  (a'), 

9 (a>) -  a  (4-3) 


Vl-2tolog(a’/a)’ 

due  to  asymptotic  freedom  (no  g 1  term  in  0(g,M)),  and  if  we  ignore  the  g  (a')  variation 
in  (i  a  simple  power  law  for  M  (a), 


M  (a\g  (a'))  ~  M„  +  ( M(a,g )  -  M„) 


SM'.: 


‘  5^. 
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(The  correct  solution  to  the  coupled  equations  Eq.  (4.2a  )  and  (4.2b)  are  also  easily  found, 
but  the  additional  logarithms  tend  to  obscure  the  discussion  at  this  point.)  In  our  ap¬ 
proximate  solution,  the  anomalous  scaling  exponent  is  *ym  =  C\g3  and  the  shift  in  the 
critical  mass  is  M„  =  —  cog2 /a'  due  to  the  explicit  chiral  symmetry  breaking  in  Wilson 
fermions.  The  scaling  solution  is  consistent  with  the  standard  chiral  parameterizaton, 
where  (m*)2  =  (const)  A qcD^quark  with  the  remormalized  m quark  properly  defined.  [10] 
In  any  event  the,  general  lesson  is  that  to  keep  a£r  —  1/m*  fixed  as  we  pass  to  the  coarse 
grained  lattice,  the  naive  scaling  must  be  discarded. 

How  important  are  these  effects?  Probably  they  are  not  too  crucial  for  a  single  rescaling 
a  —*  a1  =  2a,  but  as  we  go  deeper  they  accumulate  and  we  are  driven  out  of  the  perturbative 
regime  where  these  estimates  apply.  Then  we  must  perform  lattice  renormalization  directly. 
Since  the  string  tension  is  well  known,  g{a!)  can  be  computed  numerically  quite  easily 
by  looking  at  Wilson  loops  on  the  new  lattices.  As  we  approach  strong  coupling  m*  is 
known,  so  the  appropriate  adjustment  to  M  ( a',g(a '))  can  be  estimated.  For  example  at 
1/g *  =  0,  (am*)*  =  (21/5)  (aM  +  2),  so  that  here  7m  =  1  and  M„  =  —2,  certainly  quite 
different  from  the  free  field  scaling.  In  between  weak  and  strong  coupling  some  smooth 
interpolation  or  more  detailed  real  space  renormalization  group  procedure  should  be  used. 
Various  schemes  are  presently  under  consideration. 

We  need  also  to  deal  with  the  construction  of  the  effective  fields  Uj/i  (x')  °n  the  a' 
lattice.  As  a  guide  one  can  perform  the  computation  of  full  decimation  on  the  path  integral 
by  integrating  all  the  Grassman  variable  V>  (z)  except  the  1  out  of  16  that  survive  on  the 
coarse  lattice.  Then  expanding  in  the  Wilson  hopping  parameter  k  =  (2 aM  +  82Z)-1  to 
second  order  one  obtains  renormalized  constants  Z<f,  M'  and  R'  and  a  gauge  invariant 
choice  for  Uiy.  (z')  =  U,,  (z  +  n)  (z).  This  is  both  unitary  and  simple.  However,  better 
scaling  properties  are  usually  obtained  by  taking  a  weighted  average  of  nearby  paths  from 
x'  to  x'  +  2 n,  such  as  those  encountered  by  expanding  farther  in  k.  (See,  for  example, 
Figure  4.)  Then  the  sum  is  not  unitary.  Although  one  could  reunitarize  the  sum,  no  basic 
principle  forces  one  to  do  so. 

Finally,  we  need  to  have  a  criterion  for  the  overall  normalization  of  L.  The  most 
serious  constraint  is  the  coefficient  G*  of  the  longest  correlation  length,  C'w  =  <7*.  This 
adjusts  the  product  ZpZqjZy,  with  additional  order  g1  terms  appearing  in  perturbation 
theory.  The  individual  values  for  the  Z  factors  is  a  matter  of  convention.  To  see  this 
overall  normalization  constraint  consider  the  two  level  algorithm,  where  the  exact  solution, 
e  =  L~1f,  is  found  on  the  coarse  lattice  (i/t  —*  oo)  and  no  iterations  are  done  on  the  top 
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FIG.  4.  The  simplest  unitary  matrix  for  the  coarse  operator  L  is  Ujn  {x<)  — 
U/i(x  +  A*)  Up  (z),  with  two  additional  examples  of  paths  of  length  1  =  4  that  could  be 
added  to  “renormalize”  t  further. 


level  (i /,  =  0).  Then  the  multigrid  iteration  is  =  4?*  +  QL~1Pr,  or 

¥'+1  =  (l  -  QL-'PL)  V  +  QL-'Pb  (4.5) 

This  is  the  familiar  Jacobi  iteration  with  the  coarse  lattice  for  the  preconditioning 
matrix.  The  normalization  condition  is  that  QL~lPL  has  leading  eigenvalue  one,  so  the 
longest  wave  length  is  immediately  removed  in  the  first  iteration.  As  in  the  over  relax¬ 
ation  algorithms,  the  constraint  may  be  slightly  modified  in  numerical  work.  Additional 
parameters  may  be  introduce  in  the  operator  L  to  allow  fine-tuning  to  remove  correlations 
on  smaller  and  smaller  scales.  However,  unlike  the  traditional  real  space  renormalization 
group  this  is  probably  redundant  with  the  recursive  use  of  multigrid  and  thus  not  war¬ 
ranted  in  terms  of  computational  efficiency.  We  have  just  begun  to  investigate  the  rich 
variety  of  options  available  to  accelerate  our  basic  multigrid  recipe. 

5.  NUMERICAL  IMPLEMENTATION 

We  are  currently  in  the  process  of  applying  the  multigrid  method  to  accelerate  the  com¬ 
putation  of  quark  propagators  in  lattice  QCD.  Although  this  study  has  not  been  completed 
the  early  indications  are  that  even  the  simplest  implementation  of  the  multigrid  method 
will  lead  to  considerable  improvement  in  the  convergence  rate  over  the  standard  iterative 
methods  in  use  for  this  problem.  We  describe  here  some  of  the  special  considerations 
required  for  applying  the  multigrid  idea  to  lattice  gunge  theory. 
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It  is  first  important  to  note  that  the  Dirac  operator  L(x,y)  in  Eq.  (2.2)  is  not  Sym¬ 
metric  Positive  Definite  (SPD).  Thus  for  the  smoothing  step  we  solve  instead  the  system 

LL'4>  =  b  (5.1) 

for  <f>  and  note  that  the  solution  we  seek  is  $  =  For  this  we  use  the  least  norm 

Conjugate  Residual  algorithm  described  by  Oyanagi.  [12]  While  the  disadvantage  of  solving 
Eq.  (5.1)  is  that  we  must  apply  the  operator  L  twice,  with  this  algorithm  we  do  not  need 
to  compute  LL\  and  indeed  we  never  store  the  actual  components  of  L  or  L^;  we  need 
only  to  compute  the  effects  of  L  and  on  a  particular  vector  to  implement  the  algorithm. 

The  most  computationally  intensive  part  of  the  calculation  is  the  multiplication  of  the 
fermion  wavevectors  (*)  at  each  lattice  site  by  the  gauge  link  matrices  (*).  In  the 
computer,  the  Dirac  spinor  'fy  (*)  is  represented  by  an  array  V(IDX,IC,IS),  where  IDX 
runs  over  all  of  the  sites  in  the  lattice,  IC  runs  over  SU  (3)  colour  components  (both  real 
and  imaginary  parts),  and  IS  runs  over  the  spinor  components.  We  work  in  the  Euclidean 
chiral  basis,  for  which  the  Dirac  7-matrices  are  block  off-diagonal.  The  gauge  link  variables 
UjjP  (x)  are  stored  in  an  array  UMTRX(IX,IC,HU),  where  IDX  runs  over  the  lattice  sites, 
IC  runs  over  the  real  and  imaginary  components  of  3  x  3  SU  (3)  matrices,  and  HD  runs 
over  the  directions  of  the  links.  Thus  for  a  164  lattice  we  must  store  over  6  x  10®  variables 
and  perform  over  2  x  10®  matrix  multiplications  for  each  smoothing  iteration.  The  code 
has  been  written  for  a  vector  processor  such  as  the  Cyber  205  to  perform  these  matrix 
multiplications  on  all  lattice  sites  at  the  same  time. 

For  a  lattice  having  NS  sites  in  each  direction  the  index  IX  of  a  particular  site  (x,  y,  z,  t ) 
in  the  arrays  UMTRX  or  T  is  computed  as 

IDX  =  *  +  (y  -  1)  NS  +  (z  -  1)  NS2  +  (f  -  1)  NS3  (5.2) 

To  store  the  gauge  links  for  the  coarser  lattices  we  introduce  an  additional  coordinate 
H  =  1,2,3...  to  label  the  grids  and  add  an  offset  to  IDX  based  on  the  grid  in  which  the 
site  is  located, 

IDX  =  *  +  (y  -  1)  NS  +  (z  -  1)  NS2  +  (t  -  1)  NS3  +  LATSTEP  (W)  .  (5.3) 

That  is,  we  store  the  gauge  links  for  all  grids  one  right  after  the  other.  Since  each  grid  is 
1/16  as  small  as  the  one  before  it  the  total  storage  required  for  all  of  the  grids  is  less  than 
16/15  times  the  size  of  the  largest  grid. 
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We  take  for  the  gauge  link  on  a  coarser  lattice  the  product  of  the  matrices  from  the 
two  corresponding  links  on  the  fine  lattice.  For  the  fermion  propagator  calculation  we  need 
only  compute  these  links  once  at  the  begining  of  the  run.  In  other  applications  where  the 
links  must  be  recomputed  this  could  be  done  relatively  quickly  by  saving  the  appropriate 
indices  for  vector  gather  and  scatter  operations. 

For  projecting  the  residuals  from  a  fine  lattice  to  a  coarse  lattice,  we  simply  take  the 
value  of  the  residual  from  the  fine  grid  sites  that  are  also  in  the  coarse  lattice  (Eq.  (3.1) 
with  all  t»j  =  0,  except  two  =  1).  An  improvement  would  be  to  average  over  values  from 
neighbouring  sites  gauge  covariantly  transported  to  the  coarse  grid  sites.  This  however  is 
not  a  crucial  step,  because  after  some  number  of  smoothing  operations  with  the  semi-local 
operator  L  the  nearby  sites  will  already  be  suitably  averaged. 

The  interpolation  of  the  improved  error  from  the  coarse  grid  to  the  fine  grid  is  more 
important.  The  result  must  be  gauge  covariantly  transported  from  the  coarse  grid  sites  to 
the  fine  grid  sites,  where  it  is  to  be  averaged  over  all  of  the  distinct  paths  leading  to  each 
site,  as  in  Eq.  (3.2).  To  accomplish  this  we  use  the  following  algorithm.  To  each  site  in 
the  lattice  we  assign  a  flag,  called  a  data  bit.  For  sites  on  the  coarse  lattice,  we  set  the 
data  bit  ON  (“I  have  data”)  while  for  the  other  sites,  where  the  improved  error  is  zero, 
we  set  the  bit  OFF  (“I  don’t  have  data”).  Now  for  each  site  with  data  bit  ON,  we  look  at 
the  neighbouring  sites  for  sites  with  the  data  bit  OFF.  For  each  of  these,  we  transport  the 
data,  across  the  link  (multiplying  by  the  gauge  matrix)  and  set  the  corresponding  data  bit 
ON.  Once  this  is  done  at  all  sites  in  all  directions  the  improved  error  has  been  distributed 
to  all  of  the  nearest  neighbor  sites.  Repeating  the  process  three  more  times  spreads  the 
result  to  all  nearby  sites.  The  effect  is  to  first  interpolate  at  the  centers  of  the  edges  of 
squares,  then  at  the  centers  of  squares  (the  faces  of  cubes),  then  at  the  centers  of  cubes 
(the  faces  of  hypercubes),  and  then  at  the  centers  of  hypercubes. 

In  this  way  we  do  not  perform  any  unnecessary  matrix  multiplications.  All  gauge  links 
are  involved  in  exactly  one  matrix  multiplication,  so  the  interpolation  step  is  very  efficient. 
It  requires  only  one  quarter  of  the  number  of  operations  required  for  one  conjugate-residual 
step  on  the  fine  lattice  (1/4  of  a  Work  Unit). 

One  great  advantage  of  thiB  algorithm  is  that  it  can  be  easily  implemented  on  a  vector 
processor  such  as  the  Cyber  205.  We  can  use  the  data  bit  vector  to  collect  the  appropriate 
gauge  link  variables  and  improved  errors  with  vector  compress  instructions,  perform  the 
matrix  multiplications  with  vector  instructions,  and  then  distribute  the  results  with  a 
scatter  instruction. 
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It  should  be  clear  that  a  slight  modification  of  this  interpolation  algorithm  for  Q(xpc’) 
can  be  used  in  the  projection  step  for  P(x’,x}  to  obtain  the  average  value  of  the  residual 
transported  to  the  coarse  grid  sites  from  nearby  sites,  as  in  Eq.  (3.1). 

Computer  programs  based  on  the  principles  described  above  have  been  written  and 
are  currently  being  tested.  The  results  will  be  reported  shortly. 

0.  FERMION  MONTE  CARLO  SIMULATION 

The  methods  discussed  above  may  also  find  application  beyond  the  problem  of  calculat¬ 
ing  quark  propagators  on  large  lattices.  Indeed,  a  fundamental  issue  in  the  application  of 
numerical  methods  to  quantum  field  theories  is  how  to  simulate  the  quantum  fluctuations 
of  fermionic  degrees  of  freedom.  It  can  be  shown  that  fermionic  degrees  of  freedom,  which 
are  described  by  elements  of  a  Grassman  algebra,  introduce  into  the  probability  weight 
of  the  gauge  field  a  new  factor  proportional  to  the  determinant  of  the  Dirac  operator.  [6] 
For  the  simulation  of  the  gauge  fluctuations,  one  should  then  calculate  the  variation  of 
the  determinant  (or  better,  of  its  logarithm)  in  one  to  one  correspondence  to  every  mod¬ 
ification  of  the  gauge  field.  Computationally  this  can  be  done  exactly  only  for  very  small 
lattices.  For  any  lattice  of  practical  interest,  one  must  resort  to  approximate  calculations 
of  £lnDet  (L)  =  £Tr(ln  (L)),  where  the  lattice  Dirac  operator  is  X  and  6  is  the  variation 
at  a  link  (x).  The  approximate  calculations  are  based  on  methods  which  converge  in 
some  suitable  limit,  but  they  are  perforce  subject  to  truncation  errors  and  they  axe  badly 
affected  by  critical  slowing-down,  even  more  severly  than  the  single  propagator  calculation. 

Our  proposal  has  two  parts.  First  we  will  develop  a  multi-level  Monte  Carlo  scheme 
for  the  gauge  fields  by  a  recursive  iteration  on  coarsened  lattices  with  a  —*  2o  by  replacing 
Sgauge  (U)  and  /3  =  6 /g2  by  a  coarser  effective  action  S gauge  (u)  and  0.  We  have  already 
found  some  algorithms  for  doing  this  which  preserve  detailed  balance  exactly.  [13]  Next  we 
include  the  fermionic  determinant  in  the  multigrid  Monte  Carlo  iterations.  Here  at  each 
level,  we  must  consider  the  variation  of  the  non-local  term, 

S  TrS|a,t  (In  (X))  =  Tr0(,  (X'1  (*  +  fi,  x)  (7"  +  R)  SUM  (*))  /  (2a)  (6.1) 

Note  that  the  local  variation  at  a  link  removes  the  trace  from  the  sites.  The  problem  is  to 
compute  the  inverse  X-1  (as  4-  /*,  *)  across  a  link  at  each  level  by  multigrid  techniques.  To 
see  the  basic  idea  consider  the  replacement  of  Eq.  (6.1)  above  by  the  equivalent  expression 

Tr  (X*1  (*'  +  2 m,  x1)  Si)  +  [Tr  (X-1  (*  4-  fi,  sc)  6L)  —  Tr  (i"1  (*'  4-  2/i,  *')  *x)  ]  .  (6.2) 
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After  a  sufficient  number  of  blocking  or  coarsening  steps  it  will  be  possible  to  calculate 
the  first  term  fully.  The  second  term  is  in  effect  more  local  than  the  original  problem,  so 
it  should  converge  more  rapidly  in  the  smoothing  operation  or,  it  can  be  evaluated  more 
reliably  by  approximate  methods.  For  example,  with  the  Galerkin  form  for  SL  =  P6LQ 
and  the  iterative  rule  of  Eq.  (4.5),  we  obtain  the  multigrid  algorithm, 

Tr  (A*+1)  =  Tr  (l~16L}  +  Tr  ((l  -  QL^PL^j  A*)  ,  (6.3) 

to  compute  Tr  (A)  =  Tr  (L-1 £L),  where  now  the  long  distance  cut-off  in  the  second  term 
is  provided  by  the  same  eigenvalue  condition  which  guaranted  C'w  =  CT  as  before.  The 
details  of  our  multigrid  implementation  will  be  presented  in  future  publications. 

Finally,  there  are  very  interesting  applications  which  makes  use  of  mixed  grids,  a  fine 
grid  embedded  in  a  larger  coarse  grid.  When  one  has  well  localized  source,  as  for  example 
in  the  computation  of  weak  matrix  elements,  or  with  heavy  quarks,  or  even  in  the  mass 
calculations  for  the  bound  states,  it  is  most  important  to  have  the  best  short  distance 
representation  close  to  this  source.  At  long  distance  even  a  relatively  coarse  lattice  is 
a  great  improvement  over  the  cramped  periodic  boundary  conditions  currently  imposed 
just  beyond  the  correlation  distance.  Use  of  such  mixed-scale  grids  is  common  in  the 
context  of  multigrid  for  PDE’s.  In  this  application  one  needs  a  more  serious  development 
of  the  renormalization  group  method  presented  here  in  order  to  insure  smooth  matching 
conditions  at  the  interface  between  the  coarse  and  fine  grids. 

7.  CONCLUSIONS 

We  have  considered  here  multigrid  methods  both  as  a  means  of  accelerating  numerical 
calculations  of  quark  propagators  and  for  improving  the  simulation  of  quantum  fluctuations 
of  fermionic  degrees  of  freedom.  As  might  be  expected  the  renormalization  group  has 
played  a  prominent  role  in  these  considerations.  In  non-linear  interacting  field  theories 
such  as  QCD  renormalization  effects  lead  to  modifications  of  the  scaling  of  parameters  like 
the  bare  quark  mass  when  the  Dirac  operator  is  projected  from  a  fine  lattice  to  a  coarse 
lattice.  Renormalization  will  also  modify  the  choice  of  gauge  link  variables  on  the  coarse 
lattice  when  the  effectB  of  nearby  paths  are  considered. 

There  are  undoubtedly  many  further  advances  to  be  made  in  this  field  by  bringing 
together  the  ideas  of  multigrid  methods  and  the  renormalization  group  in  order  to  formulate 
effective  computational  algorithms. 
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Note  Added  in  Proof: 

Similar  suggestions  to  the  ones  contained  in  this  article  have  also  been  made  by 
A.  Brandt  in  Ref.  14  and  15  to  formulate  a  lattice  solver  for  the  free-field  Dirac  equa¬ 
tion  and  to  calculate  the  fermionic  determinant  by  applying  multigrid  methods.  Brandt 
reports  (private  communication)  the  usual  multigrid  efficiency  for  the  relaxation  rate  of 
his  multi-grid  Dirac  solver. 

We  are  currently  running  computer  simulations  at  JVNCC  of  our  Dirac  solver  to  assess 
its  efficiency.  There  are  many  possible  forms  of  the  renormalized  Dirac  operator,  and  the 
correct  tuning  (ie.  renormalization)  of  the  control  parameters  as  a  function  of  the  lattice 
spacing  will  be  crucial  to  ultimately  finding  an  efficient  algorithm.  Although  some  very 
simple  models  can  be  shown  analytically  to  be  accelarated  by  following  renormalization 
group  prescriptions,  the  full  Dirac  problem  is  at  present  beyond  such  analysis.  Thus  a 
major  simulation  program  must  be  carried  out  to  confirm  or  refute  our  renormalization 
group  multi-grid  proposal  for  the  Dirac  equation  in  QCD. 

REFERENCES 

1.  A.  Brandt,  Math.  Comp.  31, 333  (1977),  and  in  Multigrid  Methods,  edited  by  W.  Hack- 
enbusch  and  U.  Trottenberg,  Lecture  Notes  in  Mathematics  Vol.  960  (Springer,  Berlin, 
1982). 

2.  M.  Gell-Mann  and  F.E.  Low,  Phys.  Rev.  95,  1300  (1954); 

C.G.  Stuckelberg  and  A.  Peterman,  Hdv.  Phys.  Acta  26,  499  (1953). 


100 


Fermion  Calculations 


3.  K.G.  Wilson,  Rev.  Mod.  Phys.  47,  499  (1975). 

4.  G.G.  Batrouni,  G.R.  Katz,  A.S.  Kronfeld,  G.P.  Lepage,  B.  Svetitsky,  and  K.G.  Wilson, 
Phys.  Rev.  D  32,  2736  (1985). 

5.  J.  Goodman  and  A.D.  Sokal,  Phys.  Rev.  Lett.,  56,  1015  (1986); 

A. Brandt,  D.  Ron  and  D.  J.  Amit  in  Multigrid  Methods  II,  edited  by  W.  Hackbusch 
and  U.  Trottenberg,  Lecture  Notes  in  Mathematics  Vol.  1228  Springer- Verlag  (Cologne 
1985). 

6.  M.  Creutz,  L.  Jacobs,  and  C.  Rebbi,  Phys.  Rept.  95,  201  (1983). 

7.  R.C.  Brower,  in  Gauge  Theories  in  High  Energy  Physics,  Proceedings  of  the  XXXVII 
Les  Houches  Summer  School,  edited  by  M.K.  Gaillard  and  R.  Stora,  North-Holland 
(1983). 

8.  J.B.  Kogut,  Rev.  Mod.  Phys.,  55,  775  (1983). 

9.  K.G.  Wilson,  Phys.  Rev.  D14,  2455  (1974). 

10.  M.  Bochicchio,  L.  Maiani,  G.  Martinelli,  G.C.  Rossi  and  M.  Testa,  Nud.  Phys.  B262. 
331  (1985); 

L.  Maiani  and  G.  Martinelli,  CERN  Preprint  CERN-TH.  4467/86. 

11.  A.  Gonzalez- Arroyo,  G.  Martinelli  and  P.J.  Yndurain,  Phys.  Lett.  117B.  437  (1982).; 
H.  Hamber  and  C.-M.  Wu,  Phys.  Lett.  133B,  357  (1983). 

12.  Y.  Oyanagi,  Comp.  Phys.  Comm.  42,  333  (1986),  Eqn.  (A.7). 

13.  R.  C.  Brower  and  R.  C.  Giles,  work  in  progress. 

14.  A.  Brandt,  Proceedings  of  ihe  International  Congress  of  Mathematicians,  Berkeley, 
California  (1986). 

15.  A.  Brandt,  preliminary  Proceedings  of  the  Third  Copper  Mountain  Multigrid  Confer¬ 
ence  (Copper  Mountain,  Colorado,  April  6-8,  1987). 


Design  and  Implementation  of 
Parallel  Multigrid  Algorithms 


Tony  F.  Chanft 

University  of  California,  Los  Angeles 
Research  Institute  for  Advanced  Computer  Science,  NASA  Ames 

Ray  S.  Tuminarottt 
Stanford  University 

Research  Institute  for  Advanced  Computer  Science,  NASA  Ames 


INTRODUCTION 

We  discuss  the  implementation  of  multigrid  algorithms  for  the  solution  of  partial  differential 
equations  on  multiprocessors.  The  first  part  of  this  paper  considers  implementations  on 
hypercubes.  We  show  how  the  topology  of  the  hypercube  fits  the  data  flow  of  the  multigrid 
algorithm,  and  therefore  allows  parallel  implementations  with  relatively  low  communication 
cost.  We  present  a  timing  model  for  the  execution  time  which  accurately  predicts  experimen¬ 
tal  results  obtained  from  runs  on  an  Intel  iPSC  system.  We  further  use  this  model  to  explore 
the  influence  of  machine  and  algorithm  parameters  on  the  efficiency  of  the  method.  The 
second  part  of  this  paper  addresses  a  toad  balancing  problem  that  creates  inefficiency  on 
large  processor  systems  caused  by  processors  becoming  idle  on  coarse  grids.  We  propose 
changes  to  the  basic  multigrid  algorithm  which  exploit  these  idle  processors  and  accelerate 
convergence.  Analysis  and  examples  of  this  new  multigrid  algorithm  are  given  for  a  one 
dimensional  model  problem. 


1.  OVERVIEW. 

In  this  paper,  we  first  consider  an  implementation  of  the  basic  multigrid  method  on  a  distri¬ 
buted  memory,  message  passing  multiprocessor.  The  multigrid  algorithm  can  be  thought  of  as 
an  acceleration  of  a  basic  local  iterative  method  via  auxiliary  iterations  on  a  hierarchy  of 
coarser  grids.  At  first  glance,  it  appears  that  the  mapping  of  the  multigrid  algorithm  to  a 
multiprocessor  is  as  simple  as  it  is  for  most  other  iterative  solvers  (like  the  Jacobi  method). 
In  these  methods,  adjacent  blocks  of  grid  points  are  assigned  to  adjacent  processors.  In  this 
way,  only  local  communication  between  neighboring  processors  is  required  to  implement 
them.  Unfortunately,  the  hierarchy  of  grids  in  the  multigrid  algorithm  complicates  the  flow 
of  data.  As  we  form  coarser  and  coarser  grids,  there  are  fewer  and  fewer  points  per  level. 
This  presents  two  difficulties.  The  first  is  that  processors  which  contain  adjacent  grid  points 
may  not  necessarily  be  neighbors.  For  large  processor  arrays  this  difficulty  can  be  quite  sig¬ 
nificant.  We  illustrate  however,  that  this  problem  is  not  a  difficulty  on  a  hypercube.  This  is 
because  the  hypercube  interconnection  topology  "naturally”  corresponds  to  the  multigrid  data 
flow  in  such  a  way  that  even  on  very  coarse  grids  the  communication  distance  is  not  large  ( if 
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the  proper  mapping  of  data  is  used  ).  A  timing  model  for  the  communication  and  computa¬ 
tion  is  presented  for  a  simple  parallel  multigrid  algorithm  which  uses  an  appropriate  mapping 
to  a  hypercube.  Using  this  model,  we  can  predict  the  performance  of  the  algorithm  on  a 
variety  of  hypercubes  as  well  as  analyze  variations  of  the  basic  algorithm.  We  compare  this 
execution  model  with  our  computer  implementation  of  the  parallel  multigrid  algorithm  on  an 
Intel  hypercube  and  find  excellent  agreement. 

The  second  difficulty  in  implementing  parallel  multigird  is  that  at  coarse  enough  levels,  a  sig¬ 
nificant  portion  of  the  processors  are  idle  (ie.  contain  no  grid  points).  With  this  in  mind,  we 
consider  new  methods  which  exploit  the  previously  idle  processors  to  accelerate  convergence. 
Other  ideas  for  obtaining  additional  parallelism  have  been  considered  by  [9], [10]  and  [16]. 
Here  we  consider  a  method  based  on  filtering  in  which  the  current  problem  is  split  into  multi¬ 
ple  subproblems  corresponding  to  different  parts  of  the  frequency  spectrum.  The  idle  pro¬ 
cessors  can  then  be  used  to  solve  the  additional  problems.  The  idea  is  that  these  subproblems 
are  easier  to  solve  approximately  than  the  original  because  they  are  governed  by  a  smaller 
range  of  frequencies.  In  this  spirit,  the  extra  parallelism  is  accomplished  using  frequency 
decomposition  ideas  that  are  consistent  with  the  multigrid  approach.  Model  problem  analysis 
is  given  to  validate  the  ideas. 

2.  MULTIGRID  ALGORITHMS  ON  HYPERCUBES. 

A  brief  sketch  of  the  serial  multigrid  algorithm  that  we  consider  follows.  We  assume  some 
familiarity  with  the  multigrid  algorithm.  A  good  introductory  reference  can  be  found  in  [13]. 


proc  multigrid(f,u,level,pre_relax,post_rclax,-y) 

< 

if  (  level  -  coarsest  level  )  then  u  -  (A,fvej)  h  f 
else 

for  k  =  1  to  pre_relax  do  Jacobi(f,u, level) 
compute_residual(f,u,level, residual) 
project_residual(level,residual,proj_res) 

for  i  =  1  to  y  do  multigrid(proj_res,v,level+ l,pre_relax,post_relax,y) 
interpolate(level,v, correction) 
u  =  u  +  correction 

for  k  =  1  to  post_relax  do  Jacobi(f,u, level) 

endif 

} 

Notice  that  the  commonly  known  "V"  ("W")  cycle  corresponds  to  the  value  of  y  equal  to  one 
(two). 

Let  us  consider  a  simple  parallel  implementation  of  a  ID  multigrid  algorithm.  We  assign 
blocks  of  contiguous  grid  points  to  different  processors  using  a  Gray  code  mapping.  When 
the  number  of  grid  points  is  greater  than  the  number  of  processors,  the  algorithm  is  straight¬ 
forward.  Each  processor  performs  local  operations  on  its  grid  points  to  implement  relaxation, 
interpolation,  and  restriction  (communicating  boundary  information  when  necessary).  The 
difficulty  occurs  when  the  number  of  grid  points  is  less  than  the  number  of  processors  (which 
will  happen  on  the  coarser  grids  if  we  have  a  large  processor  array).  To  simplify  the  discus¬ 
sion  we  consider  the  case  when  there  are  it  -  1  processors  each  containing  one  grid  point  ( 
where  it  =  2  ).  The  next  coarser  grid  is  defined  by  taking  every  other  point  from  the  fine 
grid.  Notice  that  implies  that  we  will  have  many  idle  processors  on  the  coarser  grids.  Con¬ 
sider  processor  number  n/ 2.  To  perform  residual  projection,  interpolation,  and  Jacobi  itera¬ 
tions,  this  processor  has  the  following  communication  needs: 

finest  grid  level  0  :  communicates  with  processors  n/2  -  1  and  n/2  +  1. 
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grid  level  1  :  communicates  with  processors  n/2  -  2  and  n/2  +  2. 
grid  level  i  :  communicates  with  processors  n/2  -  2  and  n/2  +  2  . 

Thus  if  we  have  a  simple  processor  array  that  matches  the  PDE  stencil,  communication  dis¬ 
tances  of  2  are  necessary  on  grid  level  i.  Fortunately,  this  can  be  avoided  in  certain  situa¬ 
tions.  We  state  without  proof  the  following  result.  If  a  particular  Gray  code  (specifically  the 
binary  reflected  Gray  code  )  is  used  to  assign  grid  points  to  processors  on  a  hypercube,  then 
the  processors  that  must  communicate  with  each  other  in  the  muldgrid  algorithm  are  at  most 
a  distance  of  two  away  from  each  other  (regardless  of  the  level  of  the  grid  and  the  size  of  the 
hypercube).  Further,  there  is  a  simple  and  efficient  algorithm  (due  to  Chan-Saad  [3])  that 
allows  one  to  shuffle  the  grid  points  to  different  processors  before  moving  to  a  different  level 
so  that  we  can  maintain  communication  links  of  a  distance  one.  We  omit  the  details  and 
refer  the  reader  to  [3].  The  key  point  is  that  by  properly  mapping  a  problem  on  a  hyper¬ 
cube,  our  communication  needs  remain  local  no  matter  how  coarse  the  grid  is  compared  to 
the  size  of  the  hypercube. 

3.  MODELING  COMMUNICATION  AND  COMPUTATION. 

We  model  the  execution  time  of  this  parallel  multigrid  algorithm  for  an  elliptic  equation  with 
a  five  point  stencil  on  a  two  dimensional  grid  with  n-1  interior  grid  points  in  each  direction 
using  a  p  x  p  processor  grid.  The  execution  of  one  multigrid  iteration  consists  of  Jacobi 
sweeps,  interpolation,  residual  projection,  and  "solving"  the  coarse  grid  equation.  There  are 
two  separate  cases  which  must  be  analyzed  separately.  Specifically  when  n  z  p,  each  proces¬ 
sor  has  (  n/p  x  n/p  )  points.  Thus  when  we  communicate  with  our  nearest  neighbor  we  send 
messages  of  length  n/p.  On  the  other  hand  when  n  <  p  we  have  some  idle  processors  and 
those  processors  which  are  not  idle  contain  only  one  point.  Therefore,  communication  with 
the  nearest  neighbors  requires  messages  of  length  one.  Notice  that  even  if  n  >  p  on  the  fine 
grid,  eventually  on  some  level  (ie.  on  some  coarse  grid)  n  will  be  less  than  p. 

We  define  the  following  notation: 

T(n)  :  time  to  perform  one  multigrid  iteration  on  an  n  x  n  grid  using  p  processors. 
y  :  number  of  multigrid  iterations  done  on  each  level. 

a  +  0n  :  time  to  communicate  a  message  of  length  n  between  neighboring  processors. 
t  :  time  to  compute  one  Jacobi  sweep  at  one  point  on  the  grid. 
v  :  total  number  of  Jacobi  sweeps  that  are  performed  on  each  multigrid  level 
(ie.  pre_relax  +  post_relax). 
r  :  time  to  compute  the  residual  at  one  point, 
p  :  time  to  project  one  point  of  the  residual  onto  the  coarse  grid. 

i  :  time  to  interpolate  from  the  coarse  grid  and  apply  the  correction  to  the  previous  approximation. 
M  =  vr  +  r  +  p  +  The  computation  time  on  one  level  for  one  point. 

We  assume  in  this  analysis  that  the  hypercube  has  bi-directional  simultaneous  send  and 
receive,  bi^t  each  i^odc  can  only  send  (receive)  one  message  at  a  time.  In  addition,  we  assume 
that  n  =  2  ,p  =  2  and  that  we  continue  forming  coarser  and  coarser  grids  until  we  have  one 
grid  point.  If  we  count  the  arithmetic  operations  for  the  case  n  >  p ,  we  have: 

2 

Jacobi  sweeps  :  v(t(n/p)  +  4(a  +  0(n/p))). 

-  receive  information  on  all  four  boundaries  and  compute  new  approximation  at  all  (n/p) 

points. 

compute  residual  :  r(n/p)  +  4(a  +  0(n/p)). 

-  receive  information  on  all  four  boundaries  and  compute  residual  at  all  (n/p)  points, 
project  residual  :  p(n/p)  +  4(a  +  0(n/p)). 

-  receive  information  on  all  four  boundaries  and  project  residual  at  all  (n/p)  points, 
interpolate  :  i(n/p)  +  4(ot  +  0(n/2p))  +  4(a  +  0). 

-  receive  information  on  all  four  boundaries  as  well  as  information  on  the  corners  to 

interpolate  the  correction  at  all  (n/p)  points. 
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We  can  now  combine  these  to  obtain  a  recurrence  relation  for  the  execution  time  of  the  mul¬ 
tigrid  algorithm.  For  n  2  p  we  have 

Tt(n)  =  yTx(n/2)  +  M(n/p)2  +  [4v  +  10]P(n/p)  +  ([4v  +  16]a  +  4(i).  (1) 

For  n  =  p  we  have  the  initial  relation 

7>)  =  T2(p) 

Doing  a  similar  analysis  for  the  case  n  s  p  (  using  the  Chan-Saad  shuffle  algorithm  to  main¬ 
tain  communication  distances  of  length  one)  we  get : 

r2(n)  =  yT2(n/2)  +  M  +  [20  +  4v](a  +  0)  (2) 

with  initial  condition 


T2(  2)  =  l. 

(Note  that  n  =  2  corresponds  to  the  coarsest  grid  with  one  interior  grid  point).  Note  that 
these  formulas  are  only  valid  for  p  s:  2  as  the  assumptions  of  sending  and  receiving  on  four 
boundaries  are  not  valid  for  smaller  systems.  Solving  the  above  recurrence  relations  we  get 

y  =  1  :  7,(7.)  =  (4/3)«[(7«/p)2  -  1]  +  rf,[(n/p)  -  1]  +  d2log (n/p)  +  d3log(p/2)  +  /  (3) 
(This  result  was  first  derived  in  [7]). 

7  =  2:  r,(7i)  =  2M(n/p)2[l  -  (p/n)]  +  d,(7i/p)log(n/p)  +  d2[(n/p )  -  1]  +  (4) 

d3[(n/ 2)  -  (n/p)]  +  nf/2 

where 


dx  =  (8v  +  20)0,  d2  =  (4v  +  16)  a  +  40,  d3  =  M  +  (20  +  4v)(a  +  0). 

Notice  when  the  ratio  71/p  is  large,  the  first  term  dominates  and  so  T,(ti)  =  (4/3)M (n/p)2. 
Thus  whp  the  number  of  points  per  processor  is  large,  the  execution  time  is  reduced  by 
almost  p  which  is  in  fact  the  maximum  attainable  speed  up  on  a  p  node  hypercube.  On  the 
other  hand  when  71/p  =  1  and  p  is  large,  then  T,(n)  ~  d3log(p/2)  for  the  "V”  cycle  and 
7,(t»)  ~  p(d3  +  t)/2  for  the  "W”  cycle.  These  results  are  consistent  with  those  in  [6],  where 
results  can  be  found  for  larger  values  of  y  as  well.  We  shall  see  in  section  S  that  in  some 
sense  the  estimates  for  the  "V"  cycle  are  asymptotically  optimal. 

4.  NUMERICAL  EXPERIMENTS. 

The  model  can  now  be  used  to  predict  the  actual  execution  of  the  parallel  multigrid  algorithm 
under  different  assumptions.  To  check  the  ac  tracy  of  the  model  a  computer  code  of  this 
parallel  multigrid  algorithm  was  implemented  on  the  Intel  iPSC  hypercube.  This  code  was 
used  to  solve  Poisson’s  equation  (  «n  +  #  =  f(*,y)  )  on  a  square  grid.  The  Dirichlet  boun¬ 

dary  condi|ions  2as  well  as  the  function  f(x,y)  were  chosen  so  that  the  exact  solution  was 
u(x,y)  =  x  +  y  .  On  each  grid  level  four  Jacobi  relaxation  iterations  were  done  (  v  =  4  ). 
In  the  current  version  of  the  multigrid  code  there  is  no  convergence  checking.  Timing  exper¬ 
iments  of  the  parallel  multigrid  algorithm  were  run  using  both  four  and  sixteen  nodes.  The 
processors  were  assigned  to  subdomains  using  the  binary  reflected  Gray  code  ( in  the  x  and  y 
directions  ).  The  execution  runtimes  for  one  multigrid  iteration  (averaged  over  a  sequence  of 
iterations)  on  grids  of  various  sizes  is  depicted  by  the  dots  in  figure  1.  The  solid  lines  in  fig¬ 
ure  1  are  the  predicted  runtimes  for  one  multigrid  iteration  using  the  machine  parameters  for 
both  the  Intel  iPSC  and  the  Caltech  Mark  II  hypercube  (using  sixteen  nodes)  for  different 
grid  sizes. 
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FIG  1  :  actual  (dots)  and  predicted  (solid  lines)  execution  times  of  one  multigrid  iteration  vs.  n 
using  16  processors  for  grids  of  size  n  x  n. 

The  values  for  a  ,(3  and  the  computation  speed  were  derived  from  data  in  [14]  for  double 
precision  numbers  and  can  be  found  in  table  1.  The  value  of  M  in  the  recurrences  is  simply 
the  number  of  flops  per  multigrid  iteration  for  one  point  multiplied  by  the  computation 
speed.  Note  that  the  values  for  the  parameters  of  the  other  machines  (given  in  table  1  )  are 
only  approximate  and  are  extracted  from  data  in  [17]  and  [11]. 


beta 

machine 

8.8  e-5 

8.4  e-5 

25  e-6 

Mark  II  (double  precision) 

8.8  e-5 

4.2  e-5 

25  e-6 

Mark  II  (single  precision) 

7.0  e-3 

6.0  e-6 

25  e-6 

Intel  (double)  n  <  =  128 

6.0  e-3 

5.2  e-4 

25  e-6 

Intel  (double)  n  >  =  128 

0.0 

1.0  e-3 

40  e-5 

Connection  Machine  (single) 

1.8  e-5 

KBSI 

15  e-8 

FPS  (single) 

TABLE  1  :  approximate  machine  parameters 

The  close  correspondence  between  the  actual  runtimes  and  the  predicted  runtimes  is  an 
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indication  that  the  execution  time  model  accurately  reflects  the  runtimes  of  the  parallel  mul¬ 
tigrid  algorithm. 

One  measure  of  the  utilization  of  a  parallel  computer  is  efficiency.  The  efficiency  is  the  ratio 
of  the  execution  time  of  the  parallel  algorithm  running  on  P  processors  to  the  execution  time 
of  the  algorithm  on  a  serial  machine  whose  processor  executes  P  times  faster  than  the  proces¬ 
sors  on  the  parallel  machine.  The  efficiency  plots  shown  in  figure  2  (generated  from  the 
model)  indicate  how  large  the  ratio  (n/p )  must  be  before  we  are  close  to  the  maximum  attain¬ 
able  speedup.  For  the 
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FIG  2  :  predicted  efficiency  of  one  multigrid  iteration  vs.  n  using  16  processors  for  grids  of  size 
n  x  n 

Caltech  machine,  we  see  that  for  n/p  ~  16  we  reach  80  percent  efficiency.  This  is  an  indica¬ 
tion  that  the  ratio  n/p  does  not  have  to  be  large  before  we  get  almost  p  speed  up.  Obviously 
for  machines  with  slower  communication  parameters  like  the  Intel  iPSC  we  need  a  larger  n/p 
ratio  to  efficiently  use  the  machine. 

The  model  can  also  be  used  to  study  the  influence  of  various  machine  and  algorithm  parame¬ 
ters.  We  first  consider  the  loss  of  efficiency  for  large  processor  systems  due  to  load  imbal¬ 
ance  (idle  processors  on  coarse  grids)  and  communication  costs.  In  figure  3,  we  plot  the  effi¬ 
ciency  of  solving  a  2S6  by  2S6  grid  problem  on  machines  with  a  varying  amount  of  processors 
(using  the  T  cycle  multigrid  algorithm).  The  different  plots  corresponds  to  machines  with 
different  communication  and  computation  parameters.  The  top  curve  illustrates  the  sharp  loss 
in  efficiency  (when  p  is  large)  even  for  a  machine  with  no  communication  delays.  That  is  the 
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loss  in  efficiency  due  to  just  the  load  imbalance.  The  second  curve  from  the  top  illustrates 
the  efficiency  of  a  hypothetical  machine  with  communication  speeds  ten  times  faster  then  the 
CalTech  Mark  II  and  computation  speeds  ten  times  slower.  We  can  see  that  as  the  communi¬ 
cation  speed  relative  to  the  computation  speed  is  increased,  the  loss  in  efficiency  due  to  com¬ 
munication  becomes  negligible.  These  results  in  general  indicate  that  for  large  processor  sys¬ 
tems  with  a  small  computational  speed  per  processor  and  fast  communication  (compared  with 
the  computational  speed),  there  is  very  little  inefficiency  due  to  communication.  However,  all 
large  processor  machines  will  be  inefficient  running  this  multigrid  algorithm  due  to  the  load 
balancing  problems.  This  inefficiency  for  large  processor  machines  has  been  reported  by 
McBryan  in  [17]  based  on  his  multigrid  code  on  the  Connection  Machine. 


FIG  3  :  predicted  efficiency  of  one  multigrid  iteration  vs.  p  using  p  processors  for  grids  of  size 
256  x  256 

Finally  in  figure  4  we  compare  the  "V"  cycle  with  the  "W"  cycle  in  terms  of  efficiency  for  the 
Connection  Machine  parameters.  As  expected,  the  "W"  cycle  is  far  more  inefficient  for  such 
large  processor  systems  than  the  "V”  cycle.  This  is  due  to  the  larger  amount  of  time  that  is 
spent  on  the  coarse  grids  in  the  "W"  cycle.  Based  just  on  cost  considerations,  it  appears  to  be 
inadvisable  to  use  "W"  cycles  over  "V"  on  large  processor  machines.  However,  the  "W"  cycle 
may  have  better  convergence  properties  and  may  be  more  robust  than  the  "V”  cycle. 
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FIG  4  :  efficiency  comparison  between  V  and  W  cycles  vf  p  using  p  processors  for  a  grid  of 
size  256  x  256 


5.  A  LOWER  BOUND  BASED  ON  DATA  FLOW  IN  SOLVING  PARTIAL  DIFFEREN¬ 
TIAL  EQUATION’S. 

In  this  section,  we  shall  introduce  a  data  flow  view  of  algorithms  for  solving  elliptic  partial 
differential  equations  (pdes)  and  use  it  to  derive  a  lower  bound  on  the  execution  for  solving 
such  equations. 

What  is  the  optimal  asymptotic  time  for  solving  a  partial  differential  equation  numerically  on 
a  multiprocessor  system?  Let  us  assume  that  we  have  N  grid  points  partitioned  amongst 
processors  2( that  is  each  processor  has  N/P  points).  >From  our  previous  discussion  N  =  n 
and  P  -  p  .  To  simplify  the  argument,  we  seek  a  lower  bound  for  computationally  deter¬ 
mining  the  solution  at  just  one  grid  point.  This  will  provide  us  with  a  lower  bound  for  the 
solution  at  all  points.  To  solve  a  general  elliptic  partial  differential  equation  requires  some 
information  from  all  the  points  in  the  interior.  This  can  be  seen  by  looking  at  the  global 
nature  of  the  Green’s  function  formulation  of  the  solution.  Thus  we  consider  the  time  it 
takes  to  collapse  the  information  from  N  points  into  one  point.  The  best  that  we  can  do 
within  each  processor  is  O  (  N/P  ),  since  each  point  must  be  visited.  The  optimal  time  to 
combine  the  P  pieces  of  information  (  one  value  per  processor  )  into  one  number  is  O  (  log  P 
).  Thus  a  lower  bound  on  the  time  for  solving  an  elliptic  partial  differential  equation  is  O  ( 
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N/P  +  log  P  ).  This  result  is  consistent  with  [20]  where  more  details  can  be  found. 

Our  previous  discussion  concluded  that  the  "V’  cycle  algorithm  could  be  implemented  in  O  ( 
log  P  )  time  when  n/p  ~  1  and  p  is  large.  Essentially  this  bound  is  achievable  because  the 
hypercube  allows  a  processor  to  communicate  information  globally  (to  ail  other  processors) 
in  O  ( log  p  ).  Since  the  multigrid  algorithm  converges  in  a  constant  number  of  iterations,  the 
time  for  the  execution  of  the  entire  algorithm  is  O  ( log  p  )  when  we  have  one  point  per  pro¬ 
cessor.  Thus  the  multigrid  algorithm  obtains  the  optimal  lower  bound  for  the  solution  of  an 
elliptic  pde  when  there  is  one  point  per  processor.  When  the  ratio  n/p  »  1,  the  parallel 
multigrid  algorithm  executes  in  O  (  N/P  ).  Therefore,  considering  the  optimal  nature  of 
serial  multigrid,  parallel  multigrid  can  also  be  considered  optimal  when  we  have  many  points 
per  processor.  So  both  when  there  are  many  points  per  processor  and  when  there  is  only  one 
point  per  processor,  the  parallel  multigrid  algorithm  can  be  considered  asymptotically 
optimal.  It  is  interesting  to  note  that  this  optimal  behavior  is  achieved  even  with  the  many 
idle  processors  that  result  when  "solving”  on  the  coarse  grids. 

This  data  flow  point  of  view  is  an  interesting  way  to  get  lower  bounds  on  the  convergence  of 
a  numerical  method.  Usually,  the  information  that  is  furthest  from  any  given  point  is  on  the 
boundary.  We  can  get  a  lower  bound  on  the  number  of  iterations  to  reach  convergence  by 
determining  the  time  it  takes  for  all  the  boundary  information  to  reach  all  points  in  the  inte¬ 
rior.  For  example,  consider  the  Jacobi  method  applied  to  the  ID  Poisson  equation.  It  takes 
0(n )  iterations  before  this  information  propagates  to  all  the  interior  points  (on  an  n  point 
grid).  Thus  a  low^r  bound  for  the  convergence  of  the  Jacobi  method  is  0(n ).  Note  that  it 
actually  takes  0(n  ).  >From  this  point  of  view,  we  can  see  why  the  multigrid  algorithm 
yields  such  rapid  convergence.  One  multigrid  iteration  propagates  the  boundary  information 
to  all  the  points  in  the  interior.  Thus  the  lower  bound  on  the  convergence  of  multigrid  is  0(1) 
iterations  which  is  the  actual  convergence  rate. 

It  was  shown  that  the  communication  of  global  information  is  essential  in  rapid  converging 
methods.  The  use  of  global  information  arises  in  many  acceleration  schemes  as  well  as  in 
error  checking.  One  nice  property  of  the  hypercube  is  that  it  does  allow  one  to  communicate 
globally  in  logarithmic  time.  As  a  final  note,  it  is  interesting  to  notice  that  convergence 
checking  within  the  multigrid  algorithm  can  be  performed  with  almost  no  overhead.  For 
example,  if  as  a  measure  of  convergence  we  use  the  residuals,  which  are  already  computed  in 
the  multigrid  algorithm,  the  norm  of  the  residual  vector  on  the  finest  grid  can  be  accumulated 
at  the  coarsest  level  using  a  tree  sum  method  which  can  be  integrated  into  the  multigrid  algo¬ 
rithm.  In  fact,  by  clever  programming,  the  norm  of  the  fine  grid  residual  can  be  sent  in  the 
same  messages  which  are  used  to  transmit  the  residuals  on  the  lower  levels.  This  method 
implies  that  convergence  will  be  determined  after  one  additional  multigrid  iteration  has  been 
performed. 

6.  FILTERING. 

We  now  consider  the  "idle  processor  problem"  that  occurs  on  coarse  grids  when  using  large 
processing  systems.  We  have  already  shown  that  on  large  hypercubes  with  fewer  points  than 
processors,  it  is  possible  to  maintain  local  communication  needs  by  properly  mapping  the 
problem.  Unfortunately,  as  we  form  coarser  and  coarser  grids,  we  have  a  higher  percentage 
of  idle  processors.  This  in  turn  reduces  the  efficiency  of  the  entire  process.  More  specifically, 
if  we  have  one  point  per  processor,  a  total  of  p  points,  and  we  continue  coarsening  down  to 
a  one  point  mesh,  then  the  percentage  of  utilized  processors  is 

log  p 

2  (1/4)  /(log  p  +  1)  *  4/(31og  p  +  3). 

i-0 

A  similar  analysis  for  a  3D  problem  shows  that  the  percentage  of  utilized  processors  is  about 
8/(71og  p  +  7).  Thus,  utilization  varies  inversely  with  the  logarithm  of  the  total  number  of 
available  processors. 


I 

i 


110 


Parallel  Multigrid  Algorithms 


Unfortunately, there  is  little  room  in  the  usual  multigrid  algorithm  to  partition  the  work 
amongst  the  available  processors.  That  is,  when  there  is  only  one  point  per  processor,  there 
are  only  a  few  operations  that  are  performed  before  going  to  the  next  coarser  level.  Thus  it 
is  probably  more  expensive  to  move  the  data  and  split  the  simple  computations  than  it  is  to 
perform  the  calculations  locally.  Thus  we  seek  to  modify  the  basic  multigrid  method,  and 
thereby  utilize  the  previously  idle  processors  to  accelerate  convergence. 

Consider  the  following  new  method.  After  relaxation  on  level  one,  compute  the  residual  as 
in  standard  multigrid.  The  residual  equation  is  then 

AjX  =  r.  (5) 

Instead  of  "solving"  this  equation  using  a  coarse  grid,  split  the  residual  into  multiple  pieces 
(in  this  example  two  pieces) 

r  =  rx  +  r2  (6) 

and  consider  solving 

AjXj  =  rt  and  AjX2  =  r2.  (7) 

Approximate  xx  by  solving  on  a  coarse  grid  and  approximate  x2  by  relaxation  on  the  fine 
grid.  The  approximation  for  x  is  taken  as  the  sum  of  the  approximations  for  x,  and  x2.  The 
key  point  is  by  splitting  the  the  residual  in  an  appropriate  manner  it  is  possible  to  take  advan¬ 
tage  of  the  nature  of  the  components  to  achieve  a  fast  method. 

To  study  the  residual  splitting  we  define  the  matrix  P.  Let 

r,  =  Pr  and  r2  =  (/  -  P)r. 

The  algorithm’s  success  relies  on  the  choice  of  P.  What  properties  must  P  possess?  Since  x, 
is  solved  using  a  coarse  grid,  we  see  that  P  should  smooth  the  residual.  That  is  P  should 
damp  the  high  and  middle  frequencies  so  that  the  coarse  grid  equation  will  better  reflect  the 
fine  grid.  In  solving  for  x2,  we  use  a  relaxation  scheme  (which  in  general  are  poor  at  reduc¬ 
ing  low  frequencies).  Therefore,  an  additional  requirement  on  P  is  that  the  operator  (f  -  P) 
produce  a  solution  x2  that  contains  small  low  frequency  components.  Thus  in  summary,  P 
should  : 

1)  reduce  the  high  frequency  components  in  r  (and  thus  in  x). 

2)  hardly  alter  the  low  frequency  components  in  r. 

Notice  that  the  properties  of  P  are  similar  to  the  iteration  matrix  in  the  smoothing  relaxation. 
However,  a  big  advantage  is  that  choice  of  P  is  not  governed  by  the  differential  operator  (as 
in  the  choice  of  the  iteration  matrix  for  relaxation).  This  is  because  P  is  used  only  to  split 
the  error  into  two  components  (not  to  remove  a  component  of  the  error  as  in  the  relaxation). 
This  means  that  we  have  more  freedom  choosing  P  than  a  relaxation  iteration  matrix. 

Let  us  now  consider  a  splitting  operator  for  one  dimensional  problems.  We  first  restrict  P  to 
be  a  tridiagonal  matrix  with  constants  on  each  diagonal 


P  =  tridiag(ava0,ax) 

(8) 

The  eigenvalues  of  P  are  well  known  : 

\(  =  a0  +  2ajCOs(in7A0 

(9) 

and  the  eigenvectors  are 

{y,}j  =  sin(«/ir/2N) 

(10) 

where  the  matrix  P  is  of  dimension  N—l  x  N- 1. 


One  nice  property  of  P  :s  that  it  is  convenient  in  parallel  processing.  This  is  because  the 
operator  acts  locally.  In  other  words,  computation  of  Px  at  the  point  i  requires  only  informa¬ 
tion  from  the  left  and  right  neighbors  of  the  point  i.  This  usually  corresponds  to  low 
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communication  overhead  in  the  parallel  machine.  If  we  are  willing  to  pay  a  slightly  higher 
cost  in  communication  and  computation  we  could  use  more  general  matrices.  By  allowing 
band  matrices  with  2k  +  1  non-zero  diagonals,  it  is  possible  to  produce  a  P  matrix  with  the 
same  eigenvectors  given  in  (10)  and  eigenvalues  of  the  form  : 

2  k 

Xf.  =  aQ  +  a{cos(0.)  +  a2cos  (0;)  +  -  •  -  +  atcos  (0,)  (11) 

where  0(  =  iti/N. 

7.  ERROR  EXPRESSIONS  FOR  A  TWO-LEVEL  MULTIGRID  ALGORITHM. 

When  solving  a  problem  via  multigrid,  there  is  an  underlying  residual  equation  for  the 
correction  to  the  current  approximation.  We  denote  the  correction  equation  as 

AjX  =  r.  (12) 

In  our  simple  parallel  algorithm  we  "solve" 

Vi  =  Pr  (13) 

on  a  coarse  grid.  The  approximate  solution  obtained  is 

i,  =  l^~2XR2Pr  (14) 

where  R2  and  I2  are  the  appropriate  restriction  and  prolongation  operators,  and  A2  is  the  dif¬ 
ferential  operator  on  the  coarse  grid.  The  error  in  this  coarse  grid  approximation  is 

A(  XPr  -  I2A2  R2Pt.  (15) 

In  parallel,  we  "solve" 

Vz  =  (/  -  P)r  (16) 

using  a  relaxation  scheme  on  the  fine  grid.  Let  G2  represent  the  iteration  matrix  of  the  relax¬ 
ation  process.  Let  our  initial  guess  of  x2  be  identically  zero.  Then  the  error  after  performing 
n  relaxation  sweeps  is 

*  =  -  P)r.  (17) 

We  obtain  an  expression  for  the  error  in  the  parallel  two-level  multigrid  algorithm  by  simply 
adding  the  errors  in  i,  and  x2, 

«1+1  =  A~'pr  -  I2A~'R2Pr  +  G2A~‘(/  -  P)r.  (18) 

Assuming  that  r  is  the  residual  of  an  approximation  obtained  by  performing  v  iterations  of  a 
relaxation  scheme  with  iteration  matrix  Gj,  then  r  =  AjG^q.  Combining  these  we  get 

•,+  i  =  C(A'V  +  G2A,_1(/  -  P)  -  I2A2lR2P)AxGl }er  (19) 

We  will  use  this  expression  in  evaluating  the  convergence  rate  of  the  parallel  multigrid  algo¬ 
rithm. 


8.  CONVERGENCE  RATE  FOR  A  MODEL  PROBLEM. 

In  this  section  we  propose  an  algorithm  for  the  one  dimensional  Poisson  equation  and  analyze 
its  convergence. 

We  first  discretize  Poisson's  equation  =  f  using  central  differences  to  get  the  matrix 

Aj  =  N2  tridiag(  1,  — 2,1)  (20) 

So  our  problem  is  to  solve 


AjU  =  / 


(21) 


where  u  and  /  are  vectors.  We  now  specify  the  details  of  our  algorithm  on  this  problem. 
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The  coarse  grid  is  defined  by  taking  the  even  numbered  points  of  the  fine  grid.  For  the  res¬ 
triction  operator  we  choose  the  trivial  injection  operator.  We  use  linear  interpolation  for  the 
prolongation  operator.  The  relaxation  procedure  is  a  simple  damped  Jacobi  method  with 
iteration  matrix 

G  =  (/  +  th2  A  ,/4).  (22) 

where  t  is  a  damping  parameter.  For  the  pre-relaxation  t  =  1,  to  reduce  the  high  frequency 
errors.  For  the  after-splitting  relaxation  t  =  l.S,  which  is  chosen  to  reduce  errors  in  the  mid¬ 
dle  frequency  domain. 

Finally,  we  perform  a  residual  splitting  as  described  in  section  6.  In  our  algorithm  half  the 
processors  will  processes  the  coarse  grid  with  r,  as  the  right-hand  side.  The  other  processors 
will  work  on  the  fine  grid  with  r1  on  the  right-hand  side.  In  our  analysis,  we  assume  that  the 
splitting  operator  has  the  same  eigenvectors  as  the  discrete  Poisson  operator  and  that  the 
eigenvalues  are  denoted  by  k.. 

This  specifies  the  algorithm  and  we  may  proceed  with  the  analysis.  Because  of  the  choice  of 
operators,  we  are  able  to  determine  the  exact  eigenvalues  of  the  matrix 

T  =  (A~'/>  +  G]A~\l  ~  P)  -  /2A2',R2P)AIGj'  (23) 

which  governs  the  behavior  of  the  error  in  our  multigrid  process  (  e;+1  =  7>,  ).  This  calcu¬ 
lation  is  somewhat  tedious  and  is  omitted  (see  [8]  for  details).  The  basic  analysis  is  similar  to 
the  standard  convergence  analysis  for  Poisson  equation  on  a  square  (see  [13]).  The  result  of 
this  analysis  is  that  the  eigenvalues  of  T  are  equal  to  the  eigenvalues  of  a  series  of  2  x  2 
matrices,  AT  where 


AT  = 


c,2Ya  -  *,) 

2  2v-2 
i:C  K ■ 


2  2v-2, 

CiSi  kw 

Y(1  -  K) 


(24) 


and  /  ranges  from  1  to  N/2.  In  the  above  expression 
=  sin(iir/2lV), 

ct  =  cos(iir£2Af), 

Si  =  1  -  tsi  * 

w  =  tv(j)  =  fit  —  i. 

In  order  to  determine  the  best  filters  and  concurrent  relaxation  process,  we  seek  to  minimize 
the  maximum  eigenvalues  of  these  matrices. 

Convergence  rate  estimates  were  made  by  computing  eigenvalues  of  the  two-grid  operator 
(using  equation  (24))  for  a  simple  tridiagonal  filter  on  the  one  dimensional  Poisson  equation. 
A  comparison  of  the  convergence  rate  using  the  filter 

P  =  k  tridiag(.2S,.5,.2S) 

(vs.  using  no  filter)  is  shown  here  on  a  problem  with  a  63  point  grid.  The  parameter  k  deter¬ 
mines  how  much  of  the  smoothed  residual  is  projected  on  the  coarse  grid  (intuitively  the 
smoother  the  residual  the  closer  k  should  be  chosen  to  1).  The  entries  in  the  third  column 
denote  estimates  of  the  number  of  fine  grid  relaxations  that  can  be  done  concurrently  with 
the  coarse  grid  correction. 
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no.  of  prerelaxation 
Jacobi's 

k 

no.  of  concurrent 
relaxations 

convergence  rate 
with  filter 

convergence  rate 
without  filter 

0 

.666 

4 

.333 

1.0 

1 

.8 

5 

.2 

.5 

2 

.888 

6 

..111 

.25 

TABLE  2  :  comparison  of  convergence  rate  with  and  without  diagonal  filter 

Comparing  the  last  two  columns,  it  is  clear  that  filtering  does  produce  a  significantly  faster 
convergence  rate,  especially  when  only  a  small  number  of  pre-relaxation  sweeps  are  per¬ 
formed.  Of  course  when  comparing  the  convergence  rates,  one  must  weigh  the  additional 
cost  of  performing  the  residual  splitting  versus  the  improvement  in  convergence  rate.  For  the 
one  dimensional  Poisson  equation,  the  use  of  the  tridiagonal  filter  versus  using  one  additional 
pre-relaxation  sweep  (which  costs  about  the  same  in  computational  effort)  yields  a  slightly 
better  convergence  rate.  Of  course  one  can  consider  using  more  sophisticated  filters  than  a 
simple  diagonal  matrix.  However  it  appears  that  more  sophisticated  filters  do  not  significantly 
improve  the  convergence  rate  when  compared  to  using  more  relaxation  sweeps  on  the  one 
dimensional  Poisson  equation.  In  addition,  one  can  see  from  table  2  that  as  more  pre¬ 
relaxation  sweeps  are  done,  less  improvement  is  gained  by  the  filter.  This,  however,  is  not  so 
surprising.  In  general,  the  relaxation  sweeps  in  the  one  dimensional  Poisson  equation  are 
very  effective  in  smoothing  the  error.  That  is,  the  relaxation  smoothes  the  error  so  well  that 
it  is  difficult  to  improve  the  convergence  properties  further  by  using  the  filter.  However,  on 
problems  where  good  smoothing  rates  are  more  difficult  to  achieve  the  filtering  idea  may 
yield  more  significant  gains.  More  analysis  and  experiments  need  to  be  performed. 
Possibilities  for  two  dimensional  problems  are  being  considered.  In  two  dimensions,  the 
number  of  points  diminishes  by  a  factor  of  four  on  the  coarse  grid.  Therefore  to  keep  the 
processors  busy,  it  is  best  to  identify  four  subproblems  that  can  be  worked  on.  One  possibil¬ 
ity  is  to  filter  the  residual  into  four  components.  One  component  should  be  dominated  by  low 
frequencies  in  both  the  x  and  y  directions.  Another  component  should  be  dominated  by  high 
frequencies  in  both  the  x  and  y  directions.  The  third  component  should  be  dominated  by  high 
frequency  in  the  x  and  low  frequency  in  the  y,  and  visa  a  versa  for  the  fourth  component. 
Then  one  can  consider  performing  the  following  algorithms  on  their  respective  problems  : 
coarse  grid  correction  on  one,  relaxation  on  two,  and  semi-coarsening  on  three  and  four. 
Analysis  for  this  method  is  needed. 

9.  CONCLUSION. 

It  is  well  knowd  that  the  multigrid  algorithm  is  among  the  most  effective  methods  for  solving 
elliptic  partial  differential  equations  on  serial  computers.  In  this  paper,  we  have  shown  that  it 
can  be  effectively  mapped  to  a  hypercube  to  keep  communication  costs  low.  When  there  are 
many  grid  points  compared  to  the  number  of  processors,  it  is  possible  to  attain  almost  the 
maximum  possible  speed  up.  When  there  are  many  processors  and  only  a  few  points  per 
processor,  the  multigrid  algorithm  is  also  optimal.  Specifically,  if  the  ratio  of  points  per  pro¬ 
cessor  is  fixed  at  one  and  the  number  of  processors  p  is  varied,  the  multigrid  algorithm 
achieves  the  asymptotically  lower  bound,  O(logp)  for  solving  pde’s.  This  implies  that  for 
large  processor  systems  multigrid  is  optimal  and  that  for  small  processor  systems  where  there 
are  many  points  per  processor,  multigrid  is  still  optimal.  These  results  hold  even  with  the 
"idle  processor  problem". 

To  determine  estimates  of  execution  times  on  realistic  machines  and  realistic  size  problems, 
we  presented  a  model  of  the  communication/computation  of  the  multigrid  algorithm.  The 
accuracy  of  the  model  was  verified  by  a  comparison  with  the  timing  results  of  our  multigrid 
implementation  on  an  Intel  iPSC  32  node  system.  Using  the  model,  it  is  possible  to  predict 
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the  execution  time  of  the  multigrid  algorithm  on  various  hypercubes  (ie.  with  different 
machine  parameters).  Finally,  a  method  has  been  proposed  to  alleviate  the  "idle  processor 
problem”  by  using  the  idle  processors  to  solve  concurrently  a  new  problem  on  the  fine  grid 
defined  by  a  splitting.  These  splitting  are  such  that  if  done  correctly  the  convergence  rate  of 
the  multigrid  algorithm  can  be  improved. 
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The  Fourier  Analysis  of  a 
Multigrid  Preconditioner 
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INTRODUCTION 

Experiments  indicate  that  a  multigrid-type  cycle  can  be  used  as  an  efficient  preconditioner 
in  the  iterative  solution  of  the  discrete  problem  corresponding  to  a  singularly  perturbed 
elliptic  boundary  value  problem.  Motivated  by  a  report  of  Goldstein,  we  explore  the 
theoretical  basis  for  the  efficiency  of  such  a  preconditioner  when  applied  to  a  model 
problem.  The  techniques  developed  are  also  used  to  analyze  a  multigrid  V-cycle  when 
used  alone  as  a  fast  iterative  solver. 


1.  THE  PROBLEM 

This  work  is  motivated  by  a  report  of  Charles  Goldstein  [7]  in  which  the  author  discusses 
the  task  of  numerically  solving  the  following  elliptic  boundary  value  problem: 

+  +  a0(x)u(x)  =  f{x)  in  ft  C  1R2 

<=1  **  (1.1) 


where  x  =  (a;  1,^2)  6  ft,  0  <  e  «  1,  the  coefficients  and  data  are  sufficiently  smooth, 
and  aj(i)  >  Co  >  0  in  ft  ,  *  =  0, 1, 2. 

The  discrete  problem  arising  from  a  typical  discretization  of  (1.1)  on  a  uniform  grid 
of  mesh  size  h,  h  <  e,  is  a  large  system  of  linear  equations.  For  the  solution  of  this 
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system  to  approximate  the  solution  of  the  boundary  value  problem  (1.1)  with  a  fixed 
accuracy,  we  must  choose  the  mesh  size  small  for  small  e ,  specifically,  it  is  sufficient  to 
keep  the  ratio  h/e  fixed,  see  [11].  In  doing  so,  we  not  only  get  a  much  larger  system,  but 
the  resulting  system  is  also  more  poorly  conditioned. 

With  the  goal  of  trying  to  solve  this  type  of  system,  we  use  the  conjugate  gradient 
algorithm  as  our  iterative  solver.  It  is  known  (e.g.,  [2], [9])  that  if  we  apply  the  method 
of  conjugate  gradients  to  the  problem  Bv  as  F  where  B  is  symmetric,  positive  definite, 
then  the  number  of  iterations,  Nb  ,  required  to  solve  the  system  to  within  a  given  relative 
error,  ||u  —  v'||/||v  —  t>°||  <  rj,  is  given  by 

Nb(v)  <  Cln{2h)  yffiS)  (1.2) 

where  K(B)  =  Amax(.B)/Amjn(.B) ,  v°  is  the  initial  guess  and  t>'  is  the  i  th  approximant 
to  the  solution,  v .  Our  goal  is  to  precondition  the  system  so  that  the  condition  number, 
K  (B1) ,  of  the  new  system,  B'v'  =  F' ,  is  much  smaller  than  K(B)  and  behaves  nicely 
(bounded  or  slowly  increasing)  as  e  and  h  decrease  to  zero. 

It  has  been  observed  experimentally  that  a  certain  multigrid-type  cycle  is  an  inex¬ 
pensive  preconditioner  for  this  system.  The  effectiveness  of  this  preconditioner  is  quite 
sensitive  to  the  choice  of  the  number  of  grids,  k ,  used  in  the  multigrid  process.  Fourier 
analysis  was  used  in  [7]  in  an  attempt  to  prove  that  a  careful  choice  of  the  number  of 
grids  does  guarantee  a  good  preconditioner  in  the  case  where  0  is  a  rectangle.  Although 
Fourier  analysis  is  routinely  used  to  study  2-grid  multigrid  cycles,  the  k  -grid  analysis, 
for  k  >  2  ,  is  quite  unwieldly  and  is  not  usually  attempted.  The  difficulty  arises  from 
the  use  of  coarser  grids  on  which  certain  modes  “alias”  (see  [3])  or  are  “not  visible”  (see 
[12]).  Unfortunately,  this  “aliasing”  was  ignored  in  [7],  The  experimental  evidence  is  so 
striking,  however,  that  it  seemed  worth  trying  to  complete  the  analysis. 

We  examine  the  effectiveness  of  the  multigrid  preconditioner  by  considering  a  special 
case  of  the  boundary  value  problem  (1.1)  with  a,(x)  =  1,  t  =  0, 1,2  ,  b,(x)  =  0,  i  =  1,2, 
ft  =  (0, 1)  x  (0, 1)  and  e  real  and  small.  It  is  for  this  model  operator,  AeL  =  — e2  A  +  I , 
that  we  prove  our  basic  results.  More  general  singularly  perturbed  problems  such  as 
variable  coefficient  and/or  non-symmetric  with  positive  definite  symmetric  part  can  be 
analyzed  using  the  properties  of  the  multigrid  preconditioner  acting  on  AeL  together  with 
such  ideas  as  spectral  or  norm  equivalence,  see  [5]  and  [7]. 

Let  h  =  2~n  for  a  positive  integer,  n.  Discretizing  this  model  problem  on  a  uniform 
grid,  ft*,  =  {(lh,  mh ) :  /,  m  =  1, 2, . . . ,  2"  —  1} ,  with  mesh  size,  h,  using  a  standard  5-point 
discretization  of  the  Laplacian  (see  Section  2),  we  obtain  the  linear  system 

Ahuh  ■=  (~e2  A h  +I)uh  =  fh. 


(1.3) 
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In  Section  3  we  define  a  symmetric  linear  operator,  AfjJ ,  based  on  multigrid  ideas,  using 
A:  —  1  auxiliary  grids  of  larger  mesh  sizes,  2 *h ,  for  p  =  1, 2, . . . ,  jfc  —  1 .  In  fact,  the  vector 
M^Wh  is  essentially  one  “partial”  multigrid  V-cycle  applied  as  if  to  solve  the  problem: 

AkVh  -  (1.4) 

starting  with  initial  guess  =  0 ,  where  Ak  is  the  matrix  resulting  from  the  corresponding 
discretization  of  the  Dirichlet  boundary  value  problem  for  Poisson’s  equation.  In  order  to 
obtain  a  symmetric  operator,  we  take  symmetric  smooths.  I.e.,  if  rp  smooths  are  done  on 
the  pth  grid  in  the  fine  to  coarse  part  of  the  cycle,  then  rp  smooths  must  be  done  on  the 
pth  grid  in  the  coarse  to  fine  part.  We  take  a  fixed  rp  =  r  for  all  p  =  0, . . . ,  jfc  —  1 .  The 
adjective  “partial”  refers  to  the  following  property  of  this  particular  V-cycle:  instead 
of  solving  for  the  coarse  grid  correction  exactly  on  the  coarsest  grid,  2r  iterations  of 
the  smoother  are  applied.  We  choose  the  smoother  to  be  a  damped  Jacobi  iteration 
with  damping  parameter,  u ,  where  0  <  w  <  1 .  Taking  w  =  1  would  correspond  to  an 
undamped  Jacobi  iteration,  but  we  exclude  this  choice.  The  choice  u>  —  .5  corresponds 
to  a  Richardson  iteration.  Using  M*  as  a  preconditioner  for  (1.3),  we  claim: 

CLAIM:  If  the  mesh  size  on  the  coarsest  grid  is  choosen  to  be  approximately  equal  to 
the  singular  perturbation  parameter,  e ,  then  the  condition  number  of  the  preconditioned 
system  is  bounded  independent  of  e  and  h , 

Defining  M £  =  M* ,  where  k  is  chosen  so  that  the  coarse  grid  meshsize  ss  e ,  we  justify 
this  claim  in  3  steps: 

1.  In  Section  4  we  reduce  the  problem  to  finding  appropriate  upper  and  lower 
bounds  for  the  eigenvalues  of  MflA\.  Let  q  :  — ►  {1,2, ...,(2n  —  l)2}  : 

(iiA,t2/»)  •-»  qi,i  =  (11,12),  be  a  given  ordering  of  the  (2n  —  l)2  points  of 
Dfc,  and  let  {a;}  be  a  (given)  complete  set  of  eigenvectors  of  Ah  .  Define  a 
(2n  —  l)2  x  (2n  —  l)2  matrix,  M. ,  by 

{Ai)qiiqi  =  mj 

where 

Mi  '■=  (MhAha^otj) 

for  each  i  =  (n,i2),  j  =  (ji,h)  where  1  <  *i,ia, ji, j2  <  2"  and  (•,•)  is 
the  discrete  -  L2  inner  product.  Using  this  eigenfunction  analysis  (Fourier 
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analysis),  the  problem  reduces  to  finding  bounds  on  the  eigenvalues  of  M .  The 
off-diagonal  elements  of  M  represent  the  “aliasing” . 

2.  In  Section  5  we  obtain  a  formula  for  a  bound,  Ck  k  r  u ,  such  that,  for  every  i , 

Therefore  we  have  diagonal  dominance  of  the  matrix,  M,  provided  Ck,k,r,u, 
where 

Ch,k,r,u  '■=  Sup  CLiktriU, 
i 

can  be  shown  to  be  less  than  one.  The  constant  Ch,k,r,u  is  calculated  for  r  = 
1,2, 3, 4,  u  =  .5,  .6,  .7,  .8,  .9,  h  =  1/2  , 1/4, 1/8,. .  .,1/8192  and  all  possible 
corresponding  values  of  k.  All  computed  values  of  Ch,k,r,u  are  less  than  one 
with  the  exception  of  the  case  where  only  one  smoothing  is  used  and  u>  <  .7. 

3.  In  Section  7  we  restate  and  extend  the  results  of  [7],  giving  explicit  bounds  on 
the  diagonal  entries  of  the  matrix.  These  bounds  are  used,  combined  with  the 
diagonal  dominance,  to  show  that: 

cie2  <  \mm(M‘hA‘h)  <  <  c2e2, 

for  constants  ci,  c2  >  0.  The  diagonal  dominance  of  M  is  needed  only  to 
guarantee  the  positivity  of  the  lower  bound. 

In  Section  8  we  describe  a  few  simple  experiments  which  illustrate  the  efficiency  of 
using  the  optimal  number  of  grids  in  the  multigrid  preconditioner.  Experimental  compar¬ 
isons  are  made  between  three  different  solvers  for  the  model  problem.  In  a  preconditioned 
conjugate  gradient  routine,  two  preconditioners  are  used,  first  the  preconditioner  analyzed 
in  this  paper,  namely  the  preconditioner  based  on  the  Laplacian  with  smoothing  on  the 
coarsest  grid,  and  secondly  a  preconditioner  which  is  based  on  the  model  operator  it¬ 
self,  solving  on  the  coarse  grid.  The  third  solver  used  in  the  comparison  is  a  symmetric 
multigrid  V-cycle. 

The  techniques  used  in  the  analysis  of  “multigrid-as-a-preconditioner”  can  also  be 
used  to  analyse  “multigrid-as-a-solver” .  This  analysis  is  simpler  than  the  preconditioner 
analysis  since  we  don’t  need  diagonal  dominance  (and  we  don’t  have  it).  In  Section 
9  we  show  how  the  k-grid  convergence  bounds  obtained  in  this  way  compare  to  the 
experimentally  observed  convergence  rates  and  to  V-cycle  convergence  bounds  obtained 
by  other  methods. 
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2.  NOTATION 

Consider  the  two-dimensional  Dirichlet  problem 


("A  u  =  f  i 
l  u  =  0  on  di 


=  /  in  il  =  (0, 1)  x  (0, 1) 
on  dSl 


where  A  =  &*/dx* .  We  discretize  this  problem  on  a  family  of  grids.  Let  h  =  2~n , 

as  in  Section  1.  Choose  a  positive  integer  k,  k  <  n.  Define  a  coarse  grid  mesh  size 
hi  —  2k~1h  .  In  12  we  define  k  intermediate  grids,  (lp  ,  p  =  1,2,..., k  with  mesh  sizes 
hp  =  21~ph\  .  Clearly  h  =  hk  and 

=  {( xi,ym )  =  ( lhp,mhp ) :  /,m  =  1,2 ,...,NP  -  1}  (2.2) 

where  Np  =  l/hp  and  p  =  1,2, ...,  k  . 

We  define  the  discrete  operator,  Ap ,  which  is  the  negative  of  the  discrete  five  point 
Laplacian,  on  the  grid  Clp,  using  the  standard  five-point  discretization  of  the  differential 
operator,  —A  (see  e.g.,  [6]).  Each  Ap  is  a  sparse  (Np  —  l)2  x  (Np  —  l)2  matrix  with  a 
complete  set  of  eigenvectors,  a-p\  given  by: 

a\p\m,n)  2sin(iiTrmhp)  sinfaimhp)  m, n  =  1, ..., Np  —  1  .  (2.3) 

where  i  =  (ti,*])  ,  and  ii,»2  =  1,2, ...,  Np  —  1  .  The  corresponding  eigenvalues  are: 

(„)  4  -  2  cos  (iiJchp)  -  2  cos  (i2*hp) 

*  £5  • 

As  usual,  the  multigrid  operators  we  consider  are  constructed  from  smoothers,  Gp , 
p=  1,2, ...,  k  and  intergrid  transfer  operators,  I*_  1  and  I£-1  ,  p  =  2, 3, ...,  k  . 

To  simplify  the  analysis  we  choose  Gp(-,  •)  to  be  a  damped  Jacobi  smoother,  defined 
by 

^p(up>  fp)  =  (.1  ~  2 ujCpAp)up  +  2u >Cpfp 

=  GpUp  +  {I-Op)Ap1fp  (2.5) 

where  cp  =  /i2/ 8,  p  =  1,...,  k ,  and  Gp  is  the  linear  part  of  Gp.  We  require  that 
0  <  w  <  1.  We  do  not  allow  w  =  1,  which  would  correspond  to  a  Jacobi  iteration.  The 
constant,  cp ,  is  approximately  equal  to  the  inverse  of  the  spectral  radius,  p(Ap) .  In  fact, 
cpp(Ap)  =  1  —  0(h2),  and  therefore  Gp  is  a  contraction,  i.e., 

p(I  -  2 ucpAp)  <  1  .  (2.6) 
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We  define  inner  products  and  norms  by: 


{up,vp)p  =  h2  53  up(x)vp(x) 

ztflr 


(2.7  a) 


||«12  =  (2.76) 

for  up,  vp  defined  on  f lp  . 

For  the  projection  and  weighting  operators  we  take  7p_j  to  be  linear  interpolation: 


i  i  1  2  1  rAp 

3-1  =  7  2  4  2 

4Jl  2  lL, 


(2.8a) 


and  Ip  1  to  be  the  adjoint  of  Ip_ j  relative  to  the  discrete  -  L2  inner  products  defined 
by  (2.7a): 


1  1  2  1 

jr1  =  i6  2  4  2 

10  1  2  1 


(2.86) 


where  we  have  used  the  “distribution”  and  “collection”  stencils  as  in  [10]. 

Note  that  eigenvectors  of  Ap  are  also  eigenvectors  of  Gp .  The  eigenvalue,  g\p^ ,  of 
Gp,  corresponding  to  a\p\  is  given  by 

gf  =  1  -  2ucpv\p\  (2.9) 


where  the  constants  cp  are  related  by 


Cp-i  —  4  cp,  (2.10) 

When  we  apply  the  multigrid  algorithm,  we  transfer  vectors  to  coarser  grids.  In 
the  process  we  lose  information.  In  this  two-dimensional  problem  with  an  (h-2'n)  grid 
structure  the  four  (if  ^  Np/2  and  i2  ^  Np/2)  eigenvectors 

an<^  a(Np-«i,/v,-i2)  >  defined  on  ftp,  are  indistinguishable  on  fip_1  .  There 
are  also  2 Np  -  3  eigenvectors  as  defined  on  Qp  which  are  indistinguishable  from  the  null 
vector  as  defined  on  flp_1 ,  This  phenomenon  is  what  is  referred  to  as  aliasing. 

This  aliasing  plays  an  important  role  in  the  analysis  of  the  multigrid  process  and  we 
introduce  the  following  notation.  Given  two  multi-indices  i  =  (*i , *i )  and  j  —  (ji,ji), 
consider  and  If  a-p^  =  ±a^  then  we  write  i  ~  j  ( p ).  If  a\p^  and  aJp'>  are 

not  linearly  dependent  then  i  /  j  ( p ) . 


I 
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3.  DEFINITION  OF  THE  PRECONDITIONER 

The  multigrid  preconditioner  is  based  on  the  discrete  five  point  Laplacian.  M*  is  one 
standard  multigrid  symmetric  V-cycle  starting  with  zero  as  the  initial  guess,  except  that 
the  coarse  grid  correction  is  obtained  by  smoothing  instead  of  by  solving  exactly  on  the 
coarsest  grid.  The  coarsest  grid  level  is  determined  by  the  singular  perturbation  param¬ 
eter,  £.  This  is  the  only  dependence  of  on  e .  See  Section  4  for  details.  Having 
choosen  the  appropriate  number  of  grids,  k ,  the  multigrid  preconditioner  is  defined  re¬ 
cursively.  Choose  a  positive  (integer)  number  of  smoothings,  r.  Then  Mkfk  :=  u*  where 
up(=  Mpfp),  for  fp  defined  on  Qp,  p  =  1, . . . ,  k,  is  given  by: 

1. )  Smooth  r  times  starting  with  initial  guess  =  0: 

tip  =  Gp(0,  fp) .  (3.1a) 

2. )  Compute  the  residual  and  transfer  to  the  coarse  grid: 

rP  =  fp  ~  Apdp,  fp—\  —  If  Vp.  (3.1b) 

3. )  Compute  the  coarse  grid  correction: 

If  p  =  2,  tip-!  =  Ui  =  Gjr(0,/i)  (3.1c) 

If  p  >  2,  tip- 1  =  Mp_i/p_i .  (3. Id) 

4. )  Add  the  coarse  grid  correction: 

Up  =  Up  +  /p_jUp_ j.  (3.1e) 

5. )  Smooth  r  times  starting  with  initial  guess  —  up: 

Up  =  Gp(up,  fp) .  (3. If) 

Because  we  have  started  with  an  initial  guess  of  zero,  the  multigrid  preconditioner 
is  a  linear  operator  acting  on  /*.  This  definition  of  M/t  can  be  rewritten  as: 

Mp=  (l-G2pr)A;'  +GrpIpp_1Mp-,If-'Grp  p  =  2,...,*  (3.2) 

and  Mi  —  (I  —  Gfr)  A~ 1 . 

These  identities  rely  on  the  commutivity  of  Gp  and  Ap,  p=  1,2,  . . . ,  k . 
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4.  THE  ANALYSIS 

As  remarked  in  the  introduction,  it  is  sufficient  to  examine  the  effectiveness  of  the  multi¬ 
grid  preconditioner  by  considering  the  model  problem  (1.3).  We  take  0  =  (0, 1)  x  (0, 1) 
and  £  real  and  small.  It  is  for  this  model  operator,  A\  =  — e2  A  +  7,  that  we  prove  our 
basic  results. 

Define 

A\=t?Ak  +  I.  (4.1) 

Writing  the  symmetric  preconditioner  as  Mk  =  QkQk ,  the  preconditioned  system  is 
A\'v'  =  F'  where  A€k'  =  QkA\Qk.  Experimental  evidence  suggests  the  following: 

CONJECTURE: 

Let  r>0,0<w<l,  h  >  0  and  e  >  h .  Choose  the  number  of  grid  levels,  k ,  so 
that  hi  =  2k~1h  «  e.  Define  Mjj  =  Mk .  Then  there  exist  constants  ci ,  C2  >  0  such 
that 


ci£2  <  Xmin(MehA%)  <  Am*x  (MkA%)  <  c2e2. 


What  we  prove  is: 

THEOREM  4.1 

Let  r  =  1 ,2,3,4  and  u  =  .7, .8,. 9  or  r  =  2,3,4  and  u>  =  .5,-6.  Let  h  >  1/8192  and 
e  >  h.  Choose  fc  so  that  hi  =  2 k~1h  ss  e.  Then  there  exist  constants  cx(h),  c2(/i)  >  0 
such  that 


ci(h)e2  <  Ami„  (MkAk)  <  Am„ (M£A€h)  <  c2(h)e2.  (4.2) 

REMARK  4.1 

For  fixed  e,  r  and  ui,  numerical  evidence  indicates  that,  as  h  — *  0, 

ci(h)  —y  ci  >  0 
c2 (h)  -*  c2  >  0. 


REMARK  4.2: 

Since  Aek'  is  similar  to  MkA\ ,  (4.2)  implies  that  K(Aek)  is  bounded  independent 

of  e. 


\ 

f 

\ 
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Outline  of  the  Proof  of  Theorem  4.1: 

Define 

Hij  —  (  Mk  (caAk  +  I)  a^\  )k.  (4.3) 

Because  of  the  aliasing,  Mj  can  be  nonzero  for  j  j*  i.  However  if  i  j*  j  (1)  (i-e.  <*[** 
and  are  distinguishable  on  the  coarsest  grid)  then  Mj  =  0 . 

Choose  m  =  (mi, m2)  where  |m|  :=  max(mj , m2)  <  Nk. 

Let  j  1,  j2,  •  •  • ,  he  some  ordering  of  the  j  /v  m(l). 

We  now  define  Mm  to  be  a  4fc_1  x  4fc_1  matrix  given  by 

(A im)M  =  /<w,.  (4.4) 

We  consider  the  subspaces 

Sm:  =  linear  span  :  j  ~  m  (1)  J)  ,  (4.5) 

where  |m|  <  Nk  ■  The  Sm  are  orthogonal  (with  respect  to  the  inner  product  defined  by 
(2.7a))  subspaces  and  invariant  under  MkA\ .  Therefore  if  we  show  that 

Cie2  ^  ■^min(A4m)  <  (A4m)  <  C2£2  (4-6) 

for  each  m ,  then  (4.2)  will  be  proved. 

By  the  Gershgorin  Theorem,  any  eigenvalue,  A,  of  Mm  must  satisfy 

|A-p<i|  <  M  (4-7) 

j~i  (i) 

for  some  i  ~  m  (1). 

We  show  that  A4m  is  diagonally  row  dominant  and  therefore  we  can  use  informa¬ 
tion  about  the  behaviour  of  the  diagonal  entries  of  Mm  to  prove  (4.6).  Specifically,  in 
Section  5  we  give  a  computable  formula,  (5.5),  for  a  quantity  C'h  k  r  w,  independent  of  e, 
such  that 

yi  —  C h,k,r,u>  Pit’  (4-8) 

i~»  (i) 

>*• 

For  certain  choices  of  r  and  w,  Clh  k  r<u  has  been  computed,  for  every  t  ,  showing  that 
Ch,k,r,v  :=  sup,  C\  k  r  <  1  for  the  k  =  2, 3, . . . ,  12  grid  problems,  using  h  =  2_1  to 
h  =  2~13 .  See  Section  6.  In  Section  7  it  is  shown  that  3  c,  c  >  0  such  that 


ce 2  <  min  lia  <  max  <  ce2. 

_  ~  |<1<N*  “  |i|<N*  _ 


(4.9) 
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The  following  lemma  is  basic  to  our  analysis  of  the  off-diagonal  terms. 

LEMMA  5.1 

For  any  n,  1  <  n  <  k,  and  i  ■/  (0,0)  (n), 

£  =  (5.3) 

i~«  (i) 


where 

Proof:  in  [4]. 

We  claim  that  the  J,  can  be  bounded  by  an  expression  which  is  no  more  complicated 
than  the  expression  for  D{  (=  { MkOt\k\at\k^  )*)  : 

THEOREM  5.1 


(5.4a) 

(5.4b) 

Proof:  in  [4]. 

REMARK  5.1 

The  constants  Ch,k,r,u  can  now  be  expressed  as 


a.)  Di  =  2 uck  £ 

p 


b.)  Ji  <  2 tJCk 


C  cP  (  n 

=1  \m=p+l  / 

E  eP  (  1  ~  (  n  GmWm 

P=1  \  \m=p+l 


))  u. 


4|  9  m 


Ch,k,r,w  —  sup  ((^k,k,f,u/)  i 


where 


'h,k,r,u> 


E  eP  fl  —  n  (mflml  (  n  4|#m|£m»7m 

=1  \  m=p+l  /  \m=p+l  j 


k- 

E 

p 


k  /  k 

e  eP  (  n  - 

p=l  \m=p+ 1 


(5.5) 


49^^f7m 


Note  that  the  denominator  has  one  more  term  in  the  sum  than  does  the  numerator. 


r 
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6.  COMPUTED  VALUES  OF  THE  OFF-DIAGONAL  BOUNDS 

Ideally,  one  would  like  to  find  analytic  bounds  for  C^  k  ru,  independent  of  i,h  and  k . 
On  the  other  hand,  bounds  are  easily  computed  for  any  given  ft ,  k,  r  and  u. 

Tables  6.1  and  6.2  give  the  calculated  bounds,  Ch,k,r,ui  for  u  =  .8,  r  =  1,  2  and 
usual  values  of  h.  For  w  <  .7  we  can  only  prove  diagonal  dominance  for  r  >  1 . 

To  find  bounds  for  u  =  .8  and  r  =  1,2, 3, 4,  independent  of  ft  and  fc,  we  used 
h  =  1/8192  (which  means  >  67  million  points  on  the  fine  grid!).  These  numbers  are 
bounds  for  all  h  =  1/8192  and  all  k  corresponding  to  these  mesh  sizes.  Observing  the 
asymptotic  behaviour  leads  one  to  believe  that  they  are  also  bounds  for  all  ft  <  1  /8192 
and  any  number  of  grids,  k .  See  Tables  6.3  and  6.4. 

TABLE  6.1  Ch,k,r,w  uj  —  .8  ,  r  =  1 


ft 

2  grids 

3  grids 

4  grids 

5  grids 

6  grids 

7  grids 

8  grids 

1/16 

.345 

.472 

.507 

1/32 

.357 

.508 

.589 

.602 

1/64 

.358 

.517 

.618 

.665 

.670 

1/128 

.359 

.519 

.626 

.686 

.724 

.729 

1/256 

.359 

.519 

.628 

.692 

.742 

.753 

.755 

1/512 

.359 

.520 

.628 

.694 

.747 

.761 

.770 

TABLE  6.2  Ch,k,r,u>  u  —  .8  ,  r  =  2 


ft 

2  grids 

3  grids 

4  grids 

5  grids 

6  grids 

7  grids 

8  grids 

1/16 

.195 

.255 

.264 

1/32 

.200 

.284 

.299 

.301 

1/64 

.201 

.293 

.322 

.325 

.327 

1/128 

.202 

.296 

.329 

.334 

.339 

.339 

1/256 

.202 

.296 

.330 

.339 

.347 

.350 

.350 

.202 

.230 

.330 

.340 

.350 

.354 

.354 

.202 

.296 

.331 

.341 

.351 

.355 

.358 

TABLE  6.3  CM>r,w  u  =  .8  ,  h  =  1/8192 


2  grids 

3  grids 

4  grids 

5  grids 

6  grids 

7  grids 

.3587 

.5196 

.6284 

.6945 

.7487 

.7638 

.2018 

.2963 

.3305 

.3407 

.3505 

.3550 

.1396 

.2047 

.2282 

.2351 

.2370 

.2375 

.1069 

.1566 

.1745 

1 

.1798 

.1812 

.1818 

8  grids 


780 

358 

239 

182 


9  grids 

10  grids 

11  grids 

12  grids 

13  grids 

.7896 

.7933 

.7953 

.7948 

.7951 

.3589 

.3590 

.3592 

.3592 

.3952 

.2394 

.2398 

.2398 

.2398 

.2398 

.1824 

.1825 

.1825 

.1825 

.1825 

TABLE  6.4  Ch,k,r,u  k  =  12  ,  h=  1/8192 


= 

.5 

1 0 

= 

.6 

u 

u>  =  .8 


.795 

.359 

.240 

.183 


u>  =  .9 
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7.  BOUNDS  ON  THE  DIAGONAL  ELEMENTS  OF  Mm 

Recall  that  the  diagonal  elements,  pa ,  of  Mm  where  t  ~  m(l),  are  given  by, 

ltu  =  (MkAla\k),a\k))k.  (7.1) 

Since  A\  =  e2Ak  +  I  and  hence 

P.i  =  (e2v{,k)  +  l)  (7.2) 

the  botmds  on  the  pa  can  be  obtained  from  suitable  information  about  the  D,  ’s.  The 
following  characterization  of  the  effect  of  the  preconditioner  on  smooth  and  rough  eigen¬ 
vectors  of  Ak  is  central  to  the  analysis  and  was  given  by  Goldstein  in  [7]. 

THEOREM  7.1.  For  r  >  1,  u>  suitably  chosen  and  h  sufficiently  small,  the  D,  ’s  are 


positive  real  numbers  such  that: 

a.)  D,  =  0  (h\)  for  <  djh\ 

(7.3a) 

b.)  Di  =  for  „<»  >  d/h\ 

(7.3b) 

where  0  <  r/  <  1  and  t)  is  independent  of  h  and  d  is  a  constant. 


In  [4]  we  prove  a  more  explicit  version  of  the  same  result: 

THEOREM  7.2.  For  r>l,0<w<l  and  a  fixed  constant,  d,  where  \  <  d  <2, 


a.) 

^  ..hi  <  Di  < 
max(2,d(l  +  rw)) 

2rw  2 
—h' 

for  t/jk^ 

d 

K  h\ 

(7.4a) 

b.) 

M1~W>  <D,< 

8(1  +  rw)i/|fc) 

1 

(*) 

v\ 

for 

IV 

•2*1*. 

(7.4b) 

Proof:  in  [4]. 

These  theorems  give  us  bounds  on  the  pn ,  and,  for  example,  Theorem  7.2  leads  to 
the  following  bounds: 

£.  uj(!  ~  <  ..  <  2rud  (,2  ,  A?\ 

h\'  max (2, d(l  +  ru>))  ~  ~  3  \  d) 


For  < 


(7.5a) 
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For 


du(  1  —  w)e2 
8(1  4-  ro>) 


<  Mi.  < 


2  hi 

£  +  ~d' 
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(7.5b) 


Therefore,  talcing  hi  £  e,  we  prove  (4.10). 

Using  the  diagonal  dominance  of  the  matrices,  Mm ,  we  can  estimate  the  dependence 
of  the  condition  number  of  MkA\  on  the  ratio  a  =  h\/e2  from  the  behaviour  of  the 
diagonal  elements,  pa .  From  the  inequalities  (7.5)  we  get  an  estimate  for  the  choice  of 
a  which  minimizes  the  condition  number: 


^optimal  —  g 


(7.6) 


This  predicts  that  the  optimal  number  of  grids  decreases  as  the  quantity  rw  increases. 
One  can  also  use  (7.5)  to  show  that  it  is  better  to  choose  too  many  grids,  (a  >  aopt ), 
rather  than  too  few,  (a  <  aopt),  (see  [4]).  These  observations  all  accurately  describe  the 
experimental  results  —  see  the  next  section. 


8.  EXPERIMENTAL  RESULTS 

Our  numerical  computations  were  carried  out  with  three  objectives  in  mind: 

i)  Observe  the  optimality  of  taking  the  meshsize  on  the  coarsest  grid,  hj  ,  to  ap¬ 
proximate  the  singular  perturbation  parameter,  e . 

ii)  Check  the  boundedness  of  the  condition  number  of  the  multigrid-preconditioned 
system  as  e  and  the  fine  grid  meshsize,  h ,  decrease. 

iii)  Compare  the  efficiency  to  other  fast  solvers,  in  particular,  the  corresponding 
multigrid  algorithm  used  as  an  iterative  solver. 

We  discretize  the  boundary  value  problem: 

(  A\u  :=  (~e 2  A  +/)«  =  /  in  ft  =  (0, 1)  X  (0, 1)  (g  ^ 

1  u  —  0  on  dU, 

on  a  grid  of  uniform  meshsize,  h ,  as  in  Section  2.  Using  the  multigrid  preconditioner,  , 
as  defined  in  Section  3,  we  iteratively  solve  the  discrete  problem  using  a  preconditioned 
conjugate  gradient  algorithm.  Recall  that  k  is  the  number  of  grids  used  in  the  multigrid 
algorithm,  hk  =  h,  and  the  smoothers,  Gp  ,  1  <  p  <  k  ,used  to  define  Mjj,  depend  on 
the  damping  parameter,  w ,  and  a  fixed  number  of  smooths  per  iteration,  r .  We  solve 


(e2Ak  +  I)uk  =  Fk, 


(8.2) 
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starting  with  initial  guess,  u°k.  We  call  this  iterative  solver  PCCG(-A,sm).  The  “A” 
reminds  us  that  the  multigrid  preconditioner  is  based  on  At ,  the  negative  of  the  discrete 
Laplacian,  and  not  on  the  operator  Ak  =  s'1  At  + 1  and  “sm"  indicates  that  we  smooth 
instead  of  solving  exactly  on  the  coarsest  grid.  Experimentally,  we  find  that  a  reasonably 
good  choice  of  r  and  w  is  r  =  2  and  u  =  .8  (u  =  .8  is  optimal  for  the  corresponding 
2-grid  multigrid  solver,  see  [12]). 

We  first  consider  solving  (8.2)  with  Ft  =  1.  For  h  =  1/64  we  show  the  dependence 
of  the  number  of  iterations  required  to  reduce  the  norm  of  the  residual  by  a  factor  of 
10~®  on  the  choice  of  e  ruid  h\.  See  Table  8.1  .  For  given  e  and  h ,  the  number  of 
iterations  listed  is  the  largest  observed  for  various  choices  of  uk .  Note,  in  particular,  the 
cases  where  h\  =  e. 

Table  8.2  displays  the  number  of  iterations  required  to  reduce  the  relative  error  by 
a  factor  of  10-8  for  various  choices  of  h  and  e ,  taking  hi  =  e.  Here  we  used  Ft  =  0 . 

Finally,  we  compare  the  efficiency  of  PCCG(—  A,sm)  to  other  elliptic  solvers.  We 
take  h  =  1/64,  e  =  1/8,  Ft  s  1  and  an  initial  guess  consisting  of  a  smooth  and  a  rough 
component,  namely: 

u°k  =  10  +  20cos(647ri)cos(64xy). 

• 

We  consider  a  symmetric  V-cycle,  which  is  a  fast  iterative  solver  for  equation  (8.1), 
where  we  solve  exactly  on  the  coarsest  grid  (we  use  a  symmetric  band  solver  to  invert 
£2Ai  +  /).  We  denote  this  algorithm  by  MULT.  For  comparison,  an  (extreme)  choice 
of  a  preconditioner  for  the  preconditioned  conjugate  gradient  algorithm  is  considered, 
where  the  preconditioner  is  based  on  Ak  instead  of  At  and  we  solve  exactly  on  the 
coarsest  grid.  In  other  words,  this  preconditioner  consists  of  one  cycle  of  the  solver, 
MULT,  starting  with  initial  guess  of  zero.  This  algorithm  is  called  PCCG(-e2  A  +  I, so). 
Of  course  we  expect  the  behaviour  of  this  preconditioner  to  be  better  than  that  of  the 
simpler  ( —  A  ,sm)  preconditioner,  but  we  have  the  added  expense  of  a  coarse  grid  solve  and 
(slightly)  more  complicated  operator.  Of  interest  to  us  here  is  that  PCCG(  —  e2  A  + I, so) 
is  not  a  significant  improvement  over  PCCG(  —  A  ,sm)  if  the  optimal  choice  of  the  number 
of  grids  is  used. 

In  a  conjugate  gradient  algorithm,  the  error  reduction  factor,  ||e*  ||/||e*-i  || ,  typically 
decreases  as  k  increases,  whereas  for  a  multigrid  algorithm  the  error  reduction  factor 
increases  as  k  increases.  Therefore  the  preconditioned  conjugate  gradient  routines  will 
be  more  competitive  when  a  large  reduction  in  the  relative  residual  is  required  and  the 
multigrid  algorithm  is  more  competitive  when  a  smaller  reduction  in  the  relative  residual 
is  required. 
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We  also  observe  that  increasing  the  number  of  smoothings  per  grid  level  will  im¬ 
prove  the  performance  of  MULT  more  than  it  will  improve  the  performance  of  the 
PCCG(— A,sm)  algorithm.  Similarly,  optimizing  the  choice  of  the  damping  parameter, 
w,  will  improve  MULT  more  than  it  will  improve  PCCG(— A,sm). 

Furthermore,  one  should  keep  in  mind  that,  though  it  is  difficult  to  improve  the 
behaviour  of  the  multigrid  preconditioner,  it  is  quite  obvious  how  to  improve  the  multi¬ 
grid  solver.  Using  better  smoothers,  or  using  a  full  multigrid  algorithm  (FMG)  will 
dramatically  improve  the  convergence  rate. 

Our  first  comparison  is  made  with  parameters  which  should  give  the  PCCG(  —  A  ,sm) 
algorithm  an  advantage.  We  therefore  consider  a  relatively  inefficient  choice  of  the  damp¬ 
ing  parameter,  w  =  .5 ,  and  require  the  norm  of  the  residual  to  be  reduced  by  a  factor 
of  10~12.  The  total  cpu  time  (seconds)  is  recorded  in  Table  8.3,  with  the  number  of 
iterations  given  in  parentheses  next  to  the  time.  The  PCCG(-A,sm)  algorithm  appears 
to  be  competitive  with  MULT,  at  least  for  this  meshsize,  h.  The  PCCG(-e2 A  +  /, so) 
algorithm  is  only  slightly  faster. 

We  then  take  a  more  reasonable  value  of  w  =  .8  and  require  the  norm  of  the  residual 
to  be  reduced  by  a  factor  of  10~6 .  The  total  cpu  time  is  recorded  in  Table  8.4.  The 
multigrid  solver,  MULT,  is  now  the  best  choice. 

All  computations  were  done  on  a  VAX  11/780. 

We  end  this  section  with  a  few  comments  on  the  choice  of  using  multigrid  by  itself 
as  a  solver,  or  using  multigrid  (based  on  a  simpler  operator)  as  a  preconditioner: 

-  For  the  model  problem  (8.1),  our  experiments  indicate  that,  for  modest  values  of  h 
and  e,  a  good  multigrid  algorithm  is  more  efficient  than  a  multigrid-preconditioned 
conjugate  gradient  algorithm. 

-  In  a  true  variable  coefficient  problem,  (1.1),  the  multigrid  preconditioner  has  the 
advantage  of  being  based  on  a  constant  coefficent  operator.  In  this  case,  using 
multigrid  as  a  preconditioner  should  be  more  competitive  than  in  the  model  problem 
case.  It  is  doubtful  whether  the  multigrid  preconditioner  could  outperform  a  good 
multigrid  solver  even  in  this  case,  but  more  testing  would  need  to  be  done. 

-  In  an  indefinite  problem,  where  multigrid  solvers  sue  more  troublesome,  one  of  the 
preconditioned  conjugate  gradient  routines  for  indefinite  problems  might  be  prefer¬ 
able. 
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TABLE  8.1  Optimality  of  choosing  hi  «  e. 

Largest  (observed)  #  of  iterations  required  for  tfr'fc ll/l!r2  II  <  10-6. 

Fk  =  1  ,  w  =  .8  ,  r  =  2 


— 

£  =  1/2 

£  =  1/4 

£  =  1/8 

>  20 

>  20 

20 

Kilt 

12 

12 

10 

mjLm 

9 

8 

8 

mm 

7 

7 

9 

K&fl 

7 

8 

9 

TABLE  8.2  Boundedness  of  condition  number  independent  of  h  and  e  with  h\  =  e. 
Largest  (observed)  #  of  iterations  required  for  ||txfc  -  u*||/||ujt  —  u°k ||  <  10-6. 

Ft=0,w  =  .8,r  =  2 


h 

£  =  1/4 

£  =  1/8 

£  =  1/16 

e  =  1/32 

■EH 

5 

6 

E  f 

6 

6 

6 

6 

6 

6 

6 
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TABLE  8.3  Experimental  comparisons  of  approximate  cpu  time  (sec). 
Approximate  cpu  time  (no.  of  iterations)  required  for  ||res*{|/||reso||  <  10-12. 

Fk  =  1 ,  v  =  .5  ,  r  =  2 

£  =  1/8  ,  h  =  1  /64  ,  uj  =  10  +  20  •  cos  64?rx  cos  647ry 


#  of  grids 

MULT:V(2,2) 

PCCG(-A,sm) 

PCCG(-e2A  +  /,so) 

2 

61.3  (20) 

-  (>20) 

53.4  (io) 

4 

44.2  (21) 

40.6  (ii) 

39.2  (io) 

6 

44.4  (2i) 

4^ 

bo 

K> 

39.5  (io) 

TABLE  8.4  Experimental  comparisons  of  approximate  cpu  time  (sec). 
Approximate  cpu  time  (no.  of  iterations)  required  for  ||rest||/||reso||  <  10~6. 


Fk~l,u>=.8,r  =  2 

e  =  1/8  ,/i  =  l/64  ,u°  =  10  +  20  •  cos647rxcos647ry 


#  of  grids 

MULT:V(2,2) 

PCCG(- A,sm) 

PCCG(-e2A  +  J,so) 

2 

24.3  (6) 

49.9  (i4) 

35.2  (5) 

4 

H.3  (6) 

22.4  (5) 

29.6  (5) 

6 

14.4  (6) 

23.8  (6) 

29.7  (5) 
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9.  V-CYCLE  CONVERGENCE  BOUNDS 

In  this  section  we  briefly  describe  the  results  of  applying  the  same  techniques,  in  particular 
Lemma  5.1,  to  obtain  bounds  on  the  asymptotic  convergence  rates  for  multigrid  V-cycles 
used  to  solve  the  Dirichlet  problem  for  Poisson’s  equation  in  the  unit  square.  The  analysis 
is  simpler  in  this  case  because  we  don’t  need  diagonal  dominance.  Instead,  we  numerically 
evaluate  the  ||  ■  H/^  norm  of  the  appropriate  matrix  (i.e.,  the  largest  row  sum  of  absolute 
values)  which  is  a  bound  on  the  spectral  radius.  See  [4]  for  the  details  of  this  analysis. 
We  first  define  our  basic  multigrid  V-cycle  applied  to  the  linear  system 

BkUk  =  Fk  (9.1) 

starting  with  initial  guess,  uj ,  with  auxiliary  problems,  Bp  Up  —  fp,  p  =  1,2, . . . ,  fc  —  1 , 
corresponding  to  discretizations  on  the  coarser  grids. 

1.  Initialize: 

fk  *-  Fk 

uk*-u°k 

2.  Update: 

u*  <-  u* 

where  each  up,  p  =  2, 3, . . . ,  k  is  defined  recursively  by: 

(a.)  Smooth  r  times  starting  with  initial  guess  =  up: 

Up  =  Gp(up,  fp) 

(b.)  Compute  the  residual  and  transfer  to  the  next  coarser  grid: 

rp  —  fp  ~  Bpup,  fp-i  =  Ip  rp 

(c.)  If  p  >  2  then  return  to  (a.)  to  evaluate  up-i .  If  p  =  2  then: 

ui  =  Bi'fi 

(d.)  Add  the  coarse  grid  correction: 

Up  ~~~  Up  -f*  /p_iUp_i 
(e.)  Smooth  s  times  starting  with  initial  guess  —  up: 

Up  =  G’f(up,  fp) 
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The  sharpest  bounds  on  the  asymptotic  convergence  rates  for  the  analysis  of  the 
V-cycle  are  obtained  by  these  techniques  when  no  smoothing  is  performed  on  the  coarse- 
to-fine  part  of  the  cycle,  i.e.,  s  =  0  in  step  d.  This  is  called  an  M\  cycle.  The  symmetric 
cycle,  i.e.,  s  =  r ,  is  called  an  MG  cycle.  We  consider  two  discretizations  of  the  Laplacian, 
the  five  point  discretization,  Bp  =  Ap ,  as  given  in  Section  2,  and  a  certain  nine  point 
discretization  given  by  the  following  stencil: 


-1  -1  -1 

-1  +8  -1 

-1  -1  -1 


(9.2) 


The  corresponding  V-cycles  will  be  denoted  by,  e.g.,  Ms\,  or  MGg,  to  indicated  which 
discretization  is  being  used. 

We  consider  a  M$\  algorithm  and  compare  our  theoretical  bounds  to  the  experi¬ 
mentally  observed  asymptotic  convergence  rates.  Tables  9.1  -  9.4  show  our  theoretical 
bounds  for  r  =  1,2,3, 4  and  u>  =  4/5.  The  experimentally  observed  asymptotic  conver¬ 
gence  rates  are  shown  in  Table  9.5  for  r  =  1,2, 3, 4,  u  —  4/5  and  h  =  1/64.  The  exact 
two  grid  convergence  rates,  c.f.  [12],  are  also  shown,  see  Table  9.6. 

We  compare  our  bounds  to  the  finite  element  bounds  of  [8],  using  the  MGg  cycle 
given  by  taking  Bp  —  Ap  and  s  =  r .  The  comparison  is  possible  because  the  operators 
Ap  satisfy: 

Ap_i  for  p—  1, 2, . . . ,  fc.  (9.3) 

Eigenvectors  of  Ap  are  also  eigenvectors  of  Ap.  We  also  note  that  for  a  symmetric  V- 
cycle,  convergence  bounds  in  the  energy  norm  are  equivalent  to  asymptotic  convergence 
bounds  given  by  the  spectral  radius.  Our  bounds  are  given  in  Table  9.7  for  w  =  3/4, 
h  =  1/64,  and  r  =  1,2, 3, 4.  In  the  next  to  the  last  column  of  Table  9.7  we  show  the 
bounds  (which  are  independent  of  the  number  of  grids  used)  obtained  by  the  methods  of 
[8].  We  also  calculate  the  exact  two  grid  convergence  rates  for  MGg ,  as  in  [12],  These 
numbers  are  given  in  the  last  column  of  Table  9.7.  In  this  symmetric  case,  at  least  for 
small  r,  our  bounds  are  larger  than  the  finite  element  bounds  because  in  the  Fourier 
analysis  we  essentially  throw  away  the  post  smoothing  factors  in  the  off-diagonal  terms 
in  order  to  apply  Lemma  5.1. 
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TABLE  9.1  Ms\  Asymptotic  convergence  bounds  u  =  .8  ,  r  =  1 


2  grids 

3  grids 

4  grids 

5  grids 

6  grids 

7  grids 

.615 

.719 

.715 

.622 

.749 

.769 

.750 

.624 

.758 

.797 

.800 

.787 

.625 

.760 

.808 

.826 

.820 

.815 

.625 

.761 

.812 

.835 

.835 

.830 

TABLE  9.2  Ms\  Asymptotic  convergence  bounds  u>  =  .8  ,  r  =  2 


2  grids 

3  grids 

4  grids 

5  grids 

6  grids 

7  grids 

.369 

.454 

.455 

.370 

.460 

.481 

.481 

.370 

.466 

.490 

.491 

.491 

.370 

.467 

.495 

.499 

.500 

.499 

.370 

.468 

.495 

.502 

.505 

.505 

TABLE  9.3  Ms\  Asymptotic  convergence  bounds  u>  =  .8  ,  r  =  3 


2  grids 


5  grids 


6  grids 


7  grids 


h 

2  grids 

3  grids 

4  grids 

5  grids 

6  grids 

7  grids 

1/16 

.284 

1 

m 

1/32 

.284 

■ 

fey:?  1 

1/64 

.284 

.302 

i 

■ 

1/128 

.221 

.284 

.302 

.307 

; 

.309 

BLE  9.5  Ms\  Experimental  asymptotic  convergence  rates 

ut  =  .8,  h=  1/64 


r 

2  grids 

3  grids 

4  grids 

5  grids 

6  grids 

n 

.600 

.600 

.600 

.600 

.600 

iB 

.360 

.360 

.360 

.360 

.360 

iB 

.216 

.228 

.233 

.242 

.246 

D 

.137 

.158 

.171 

.181 

.193 

BLE  9.6  Af5\  Two  grid  asymptotic  convergence  rates  lo  —  .8 


h 

r  =  1 

WBM 

■SI 

BBS 

1/16 

.592 

.351 

.208 

.135 

1/32 

.598 

.358 

.214 

.137 

1/64 

.600 

.359 

.216 

.137 

1/128 

.600 

.360 

.216 

.137 

h&W- 

*  /.Vrt  ,■ 

s’.  'A*:*- 


**•  — Va<* 
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1.  INTRODUCTION 

This  work  is  motivated  by  (i)  the  large  number  of  convergence  proofs  for  the  multigrid 
V-cycle,  and  (ii)  the  experimental  evidence  of  the  success  of  the  multigrid  V-cycle  for 
“nasty”  domains,  for  example  see  [11].  A  careful  analysis  of  the  V-cycle  convergence 
proofs  [10]  indicates  that  all  are  equivalent  to  the  application  of  the  “algebraic  criterion” 
of  McCormick  [7],  c.f.,  equation  (4.5)  in  this  paper,  and  also  [3]-[7].  These  convergence 
proofs  yield  an  upper  bound  on  the  rate  of  convergence  which  is  (a)  less  than  one,  and 
(b)  independent  of  the  number  of  grids  involved  in  the  process.  The  bounds  on  the  con¬ 
vergence  factor  are  not  too  sharp,  see  e.g.  [2],  and,  in  all  cases  that  we  know  of,  the 
verification  of  the  “algebraic  criterion”  follows  from  complete  that  we  know  of,  the  verifi¬ 
cation  of  the“algebraic  criterion”  follows  from  complete  H 2  regularity  of  the  operator  (in 
the  case  of  second  order  elliptic  problems).  Unfortunately,  in  “nasty”  domains  one  does 
not  have  complete  H2  regularity.  On  the  other  hand,  experimental  studies  (of  necessity) 
involve  only  a  finite  number  of  grids,  usually  no  more  than  five. 
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Thus,  we  axe  led  to  two  questions: 

I.  Is  complete  H 2  regularity  essential  for  the  algebraic  criterion? 

II.  If  yes,  can  one  obtain  a  good  estimate  on  the  rate  of  convergence  of  the  V- 
cycle  as  a  function  of  the  number  of  grids  -  that  is,  an  estimate  showing  that 
the  potential  degradation  of  the  rate  of  convergence  of  the  multigrid  process 
(degradation  as  the  number  of  grids  increases)  is  slow  in  those  cases  where  the 
full  H 2  regularity  is  absent? 

In  this  report  we  show  that  the  answer  to  both  questions  is  “yes”  for  the  model  prob¬ 
lem:  the  Dirichlet  problem  for  the  Poisson  equation  in  a  polygonal  domain.  This  problem 
is  formulated  in  section  2.  The  multigrid  method  for  its  solution  is  given  in  section  3.  In 
section  4  we  discuss  the  relationship  between  the  discrete  regularity  assumption  (which 
follows  from  H 2  regularity)  and  the  usual  assumptions  which  lead  to  V-cycle  convergence 
theorems.  In  section  5  we  prove  that  the  discrete  regularity  assumption  implies  H 2  reg¬ 
ularity.  In  section  6  we  discuss  the  V-cycle  for  less  regular  problems.  In  particular,  if  the 
best  estimate  for  O  is 

IMI«»+«(n)  <  C||Au||w«-i(n)  ,  (1.1) 

with  some  a  e  (0,  l)  ,  then  we  find  that  if  one  employs  a  V-cycle  with  k  grids,  then  the 
convergence  factor  in  the  energy  norm  is  bounded  by 

l  —  bik~^  (1.2) 

with  some  constant  b\ . 

Note  that  if  h,  the  fine-grid  mesh  parameter  is  given  by 

h  =  const.  2~k  , 

then  it  means  that  the  convergence  factor  is  bounded  by 

1  —  const.  |  In  h  |  ~~s~  , 

which  is  much  better  than  e.g.  the  usual  convergence  rates  for  SOR. 

For  completeness,  note  that  an  analysis  of  the  W-cycle  for  those  cases  where  (1.1)  is 
the  best  possible  inequality  is  contained  in  [6],  [7]  and  in  further  references  therein. 

Remark.  It  should  be  noted  that  while  all  experimental  studies  show  more  noticable 
degradation  of  the  rate  of  convergence  for  “nasty”  domains  than  for  “nice”  domains,  there 
is  no  experimental  study  which  suggests  that  the  actual  convergence  factor  pk  —>  1  -  Our 
bounds  are  strict  upper  bounds  for  all  experiments  known  to  us. 
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2.  THE  PROBLEM 

Let  SI  be  a  polygonal  domain  in  2R2  .  For  simplicity,  we  restrict  ourselves  to  the  model 
problem 

-  A  u  =  /  in  fl 

u|an  =  0 


in  a  variational  formulation: 


u  e  H  :  a(u,  v)  =  (/,  v)  V  t >  e  H  , 


<*(«,*>)  =  J  V« 

Q 

(/>)  =  J  fv  , 


V«Vv, 


H  =  Hl0  (SI) 

Let  S0  C  Si  C  S2  C  •  •  •  C  S*  C  •  •  •  H  be  a  hierarchy  of  usual  linear  finite  element  spaces 
with  characteristic  spacings  h0,  hi,  h2...,  h0  =  2 hi,  h j  =  2h2  ,  etc.  Let  usual  shape 
and  size  regularity  be  satisfied.  Identify  each  S*  with  the  isomorphic  space  of  nodal 
values  u(xi).  Define  the  inner  product  on  S*  by 

(u,v)k  =h\^2  u(xi)v(xi) 


(summing  over  all  nodes  x,  associated  with  Sk ). 

The  associated  norms  ||u||J  =  (u,u)k  are  uniformly  equivalent  to  the  L2(Sl)  norm 
restricted  to  Sk. 

The  discretization  of  (2.1)  is  then 


LkUk  =  /* 


with  Lk  :  Sk  — >  Sk  defined  by 


{Lkuk,vk)k  =  a(uk,vk)  Vuk,vkeSk 


(2.3) 
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and  fi,  t  Sk  given  by 


{/*,«*)*  =  (f,Vk)  VvkeSk. 


(2.4) 


3.  THE  MULTIGRID  ALGORITHM 

Each  function  u  e  Sk~ \  is  also  in  St  .  Accordingly,  there  is  a  linear  operator  /*_, 
St-i  — *  Sk  such  that  if  ut_j  is  the  vector  of  nodal  values  of  it  ,  then 

u*  =  Jt-iu*-i 


is  the  vector  of  nodal  values  of  the  same  function  u  considered  as  an  element  of  St .  Let 
/£-1  =  (/*_ i)*  ,  the  adjoint  in  the  ( ,  )t  inner  products.  That  is,  if  uk-i  e  Sk-i ,  wt  t  Sk 
then 

(JjjLjUt-i  ,vk)k  =  (wfc-i  • 

With  this  notation  it  is  easy  to  verify  from  (2.3)  that 

(3.1) 

These  are  our  “coarse  grid  operators”. 

In  each  space  St  we  consider  the  “smoother”  Gt(u,F)  given  by 

Gk(u,F)  =  GkU  +  -^—F  (3.2) 

P(Lt) 

where 

<33) 

and  p{Lk )  denotes  the  spectral  radius  of  Lt  • 

We  now  define  the  multigrid  algorithm  MG(k,  u,  F)  in  a  recursive  way:  Let  m  >  1 
be  a  fixed  integer.  Let  ut  be  our  current  approximant  of  Uk  ,  the  solution  of  LkUk  =  Fk  ■ 
Then  we  obtain  our  new  approximant  as  follows: 

(a)  If  k  =  0  ,  then 

MG(0,uo,Fo)  =  U0  =  L^F0  . 

(/?)  If  k  >  1  ,  perform  the  following: 

(i)  Do  «t  4—  Gk(uk,  Fk)  m  times. 

(ii)  Set  rfc  =  Ft  -  Lkuk  and  Ffc_i  =  ~xrk  , 

(iii)  ut-i  ♦-  MG(k  -  l,0,F*_i) 


Decker,  Mandel,  and  Parter 


147 


(iv)  u*  *-  u*  + 

(v)  Do  m  «— G*(u*, Ft)  m  times. 

Then,  the  result  of  steps  (i)-(v)  gives 

Uk  «-  MG(k,Uk,Fk ) . 


Remark.  This  particular  multigrid  algorithm  is  called  a  symmetric  V-cycle.  It  is  easy 
to  see  that  this  is  the  composition  of  two  “one-sided”  V-cycles.  If  step  (i)  is  omitted 
this  algorithm  is  the  coarse-to-fine  V-cycle  M/*(u*,F*)  .  If  the  step  (v)  is  omitted  this 
algorithm  is  the  fine-to-coarse  V-cycle  M\t(u/t,  Fjt)  .  It  is  well-known  [8],  [9]  that 

MG(fc,u*,Ft)  =  (M/k  o  M\k){uk,Fk)  (3.4) 

and 

M\k(-r)  =  M/k(;-r  ,  (3-5) 

where  the  adjoint  is  taken  in  the  energy  inner  product  (•,  •) ,  defined  by 

(u,v)l„  =  ( Lku,v)k  . 


4.  EQUIVALENCE  OF  THE  DISCRETE  REGULARITY  ASSUMPTIONS  AND  V- 

CYCLE  ASSUMPTIONS 

Denote 

Tk  •'=  I  -  Ik-iLk-iIk1  Lk 

the  Lk  -orthogonal  projection  onto  the  null-space  of  .  Let 

]je  |||=  (Lke,e)l, 

It  is  well-known  (see,  e.g.  [2])  that  if  is  convex  then  H2  regularity  holds.  That 
is,  there  is  a  constant  C  such  that  if  u  is  the  solution  of  (2.1)  and  /  e  L2(fl)  ,  then 

IMl«>(n)  ^  c’il/IUl(n)  •  (4-1) 

Further  we  know  e.g.,  from  [6],  [7],  that  (4.1)  and  the  assumptions  about  finite  elements 
above  imply 


3  6  so  that  Vfc  :  p(TkLk  1  )p(Lk )  <  6  . 


(4.2) 
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We  call  (4.2)  the  “discrete  regularity  assumption”. 

THEOREM.  Let  G*  =  I  —  and  m  >  0  be  fixed.  Then  (4.2)  is  equivalent  to  each 
of  the  following: 

(4.3)  3  7  <  1  so  that  Vh  and  V  e  e  Sk  : 

II  (Gjt)me  J2  <  7||Tte||2  +  ||(/-rfc)e||2 

(4.4)  3  ft  >  0  so  that  Vh  and  Ve  e  S*  with  (Gt)me  ^  0  : 

18  £  IP  ||  Tk(Gk)me  [j2 

I!  (G*)me  I2  ~  H  (Gfc)me  ||2  ’ 


Proof:  From  Mandel,  McCormick,  Ruge  [6]  (see  also  [7])  we  know  that  (4.2)  is  equivalent 
to: 

36'so  that  V  e  e  Sfc  :  ||  Tke  f  <  S'  ,  (4.5) 

with  ||  •  ||t  the  norm  induced  by  (v)fc  .  From  [6],  Theorems  6.2  and  6.1,  we  have  that 
(4.2)  implies  (4.3)  with 

1 


7=1- 


1  + 


2m 


and  (4.4)  with 


P  =  2m  1 6  . 

Proof  of  (4.3)  =>  (4.5)  :  We  have 

7  ||  Tke  ||2  +  ||  (I  -  Tk)e  «2  =  |f  e  ||2  -(1  -  7)  II  T*e  ||2 
and,  using  the  inequality  (1  —  A)2m  >  1  —  2mA  ,  we  get 


=  II  e  ||2  -2m 


llitell 


so  (4.3)  gives 


e  IP  — 


PiLk) 


P(Lk)  ’ 

2  _  2m|J£fcejli  <  e2  _  (1  _  7)  u  TkC  || 2 


(4.6) 
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Remark.  We  have  considered  the  smoother  Gi  =  /  —  for  convenience  only  in 

order  to  avoid  inessential  technical  arguments.  Similar  results  can  be  proved  for  G*  = 
I  —  piij  Lk  ,  0  <  u>  <  2  ,  as  well  as  for  more  general  smoothers. 

Inequality  (4.3)  yields  the  convergence  of  the  coarse-to-fine  V-cycle,  M/i(-,  •)  ,  with 
a  convergence  bound  pi  =  s/y  of  the  convergence  factor 

p*  :=  inf{e  :||  Ui  -  MG(k,Uk,Fk )  ||<  e  ||  Uk  -  uk  ||  ,  for  all  ut}  .  (4.8) 


Thus  (3.4)  and  (3.5)  yield  the  convergence  of  the  symmetric  V-cycle,  MG(k,  •)  ,  with 
a  convergence  bound  pi  =  7  ,  see  [8],  Inequality  (4.4)  implies  the  convergence  of  both 


and  M\i(-,-),  with  a  convergence  bound  pi  = 


and  hence  of  the  sym¬ 


metric  V-cycle  with  a  convergence  bound  pi  =  ,  see  [6],  [7]. 


5.  EQUIVALENCE  OF  DISCRETE  REGULARITY  AND  ELLIPTIC  REGULARITY 
The  purpose  of  this  section  is  to  show  that:  if  the  discrete  regularity  assumption 
(4.2)  holds,  then  H2  regularity  (4.1)  holds.  This  follows  from  an  inverse  theorem  of 
Widlund  [12,  page  332]  and  the  following. 

LEMMA.  If  the  discrete  regularity  assumption  holds,  then  there  is  a  constant  C\  >  0 
such  that 

||u  —  Uk||£*(n)  <  Ci  h2 ll/ll x,j(q)  , 

where  Uk  ,  h  =  hi  ,  is  the  solution  of  (2.2),  and  u  is  the  solution  of  (2.1). 

Proof:  We  have 

TkL;1  =  l;1  -  ■ 

Hence,  TiL* 1  is  symmetric  and 

p{TkL-kl)  =  \\TkL-k'\\k. 

Let  Uk  be  the  solutions  of  (2.2)  with  /i  given  by  (2.4).  By  (3.1), 

||TiLj1|U=  sup  ||Ui-Ui_1||t. 

Il/*ll*=i 

Because  the  norms  ||  •  ||*  are  uniformly  equivalent  to  the  Z2(fi)  norm,  i.e., 

3  C  and  Vui  e  Si  :  ~  |(«*||L*(n)  ^  llu*IU  ^  ^’llu*IUJ(n)  » 
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III/*  -  <  C||£7*  -  Vk-i\\k  <  II/*  II*  • 

From  the  definition  of  /*,  cf.  (2.4),  and  uniform  equivalence  of  the  norms  ||  •  ||*  and 
||  •  ||t*(n)  it  follows  that  ||/t||*  <  <7||/||z,>(o)i  /  «  L2( ft)  .  Thus,  using  the  fact  that 
p{Lk)  «  Cfc*2  , 

V/  e  L2(ft)  :  ||Ef*  -  y*-i|U*(Q)  <  CA^||/|U1(n)  . 

Hence, 

||u  Uk || L2(ft)  <  ||C/fc  -  lf*+i|U*(0)  +  ||^*+i  —  ^*+2||i»(n)  +  ^  2C5/i2||/||iJ(n)  , 

using  the  fact  that  ||u  -  <  11“  ~  ^j'lltf^ft)  -»  0  .  j  which  holds  without 

any  regularity  assumptions.  I 


6.  V-CYCLE  FOR  LESS  REGULAR  PROBLEMS 

If  ft  is  not  convex,  then  the  discrete  regularity  (4.2)  no  longer  holds.  Instead,  we  have 
only  (see  [6],  [7]) 

3  6  so  that  V  k  :  p{TkLia)p{LZ)  <  6a  .  (6.1) 

From  [6],  [7]  we  know  that  (6.1)  gives 


(Gk)me  ^  0  V  e  e  Sk  such  that 


IBell2  ,  a^l»  Tk(Gk)me  |||2\  “ 

II  (G*)me  l2  -  P\  III  (Gfc)me  |||2  j 


(compare  with  (4.4)!),  which  in  turn  yields  convergence  bounds  pk  for  the  one-sided 
V-cycles,  given  by  the  recursion: 


(p*)2  =  max 
o<c<i 


(ft-i)8  +  (l-(ft-i)2K 
1  +  £C« 


(6.2) 


If  a  =  1  ,  then  this  gives  (p*)2  =  max  {(pit_i)2,  j+p}  .  If  a  <  1,  then  p*  -*■  1  as 
k  — >  oo  . 

The  purpose  of  this  section  is  to  show  that  while  pk  — »  1  ,  this  convergence  is  slow. 
Indeed, 

3  C  >  0  so  that  V  k  :  (pit)2  <  1  —  Ck~~z~  . 
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LEMMA  1.  Suppose  a  <  1  ,  let  C\  =  (1  —  ”  ,  and  assume  (p*-i)2  >  y+q  • 

Then 

(, Pk ?  <  {Pk-xf  +  Ci(l  -  (p/t-i)2)1^  ■  (6.4) 


Proof:  Write  (2)  as 


(p*)2  =  max 
»<«<! 


+  xT)£_ 

l  +  ft 


(6.2') 


using  the  substitution  (  —  .  Because  the  function  (  — >  C°  is  concave,  we  have  for  any 

c  e  (0, 1]  and  any  f  >  0  that 


C  <ca  +  ((-  c)oc“_1  =  oc“~ +  (1  -  a)ca  . 

Define  £c  as  the  solution  of  aca~1£  +  (1  —  a)c“  =  1  .  Then,  using  the  fact  that  c  <  1  , 


ic  =  [1  -  (1  -  a)c"]/o:c“  1  >  c1  "  . 


Hence,  from  (6.2' ), 


(Pfc)2  <  max 


(Pfc-i)2  +  (1  -  (pt- 1)2)  min  {l,oc°f  *£  +  (!-  0)0°} 

1+# 


<  max  {(pfc-1)2  +(1  -  (pfc_i)2)(l  -  a)c°,  -  +  ) 

<  max  {(p*-i)2  +  (1  -  (p*_j)2)(l  -  a)ca, 


Let  c  =  ^  TP*  ^  ”  ■  Note  that  c  <  1  because  (pn-j)2  >  •  The  result  follows 

immediately  .  I 

Since  p*  — ►  1  ,  we  turn  our  attention  to  the  small  quantities  djt  =  1  —  (pk)2  >  0  . 

LEMMA  2.  Let  a  =  y—  and  let  C  >  0  be  a  fixed  constant.  Let  {Zk},  {Tit}  1  k  = 
1,2, .. .  be  two  sequences  of  positive  numbers  which  satisfy 

Zk  —*  0  ,  Yk  — >  0  as  k  — x  00  , 


Zk+ 1  >  ~  1 


(6.4) 
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(6.6) 

Proof:  Consider  the  function 

g(x)  =  x  —  Cxa  . 

This  function  is  monotone  increasing  for  0  <  x  <  (aC)~«^T  .  The  proof  proceeds  by 
induction:  Assume 

0  <  Y/t_ i  <  Zk-i  • 

Then  using  the  monotonicity  of  g(x)  and  (6.4)  and  (6.5)  we  have 

Zk>g(Zk.1)>g(Xk.1)>Yk.  I 
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Then 


tt+i  <  Yk  -  CY?  , 

0<Yk  <Zk  ,  k  =  1,2. 


THEOREM.  There  exists  an  index  ko  and  a  constant  b,  >  0  such  that 


0\ 

jj  <dk  ,  k  >  k0  , 


where 


b  = 


1  1  -  a 


a  —  1  a 


(6.7) 


Proof:  Let 


‘■-aruy 

Let  fco  >  1  be  an  integer  such  that 


(6.8) 


1*0 


Choose  bi  <  b\  such  that 


stirf 


pT  -  • 


(6.7) 
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Now  let 


Zk  —  dk0+k-i  ,  k  =  1,2... 
bi 


Yk~  (Jto  +  Jfc-l)4  ’ 

Thus,  (6.7)  yields  assumption  (6.6)  of  Lemma  2.  Of  course  (6.3)  yields  (6.4).  To  apply 
Lemma  2,  we  need  only  to  verify  (6.5).  We  must  show  that 

Yk+1  =  (ifeo  +  k )»  -  (k0  +  k- 1')»  “  Cl {(ko  +  k-iy)  =  /(n)  ’ 
or,  equivalently, 

c(  h  X  1 _ 1 

'Ulo  +  ic-l)7  -  '[(^o  +  fc-l)4  (fco  +  fc)4.  ' 

By  the  mean  value  theorem  for  the  term  in  the  brackets, 


< 


l  (Jfc0 +  *-!)»  (*o  +  *)* 


(*o  +  it)4+1 

Since  b  +  1  =  ab  ,  it  suffices  to  show  that 

c  b  ?  ^  M 

‘(fco  +  fc- 1)4+I  -  (*0  +  *)6+1  ' 

This  final  estimate  follows  from  (6.7)  and  (6.8). 

Thus,  applying  Lemma  2  we  have  shown  that 

bx 


(fco  +  *06 


<  dt0+jt  ,  fc  =  0,1,2,  ■■■ 


and  the  theorem  is  proven.  I 

Let  us  restate  this  theorem  in  terms  of  the  (p*)2  ■  We  have 


(p*)2  <  1  ~hk 


a-  1 


Note  that 


Since  (p*)2  is 
symmetric  V-cycle, 


a 

the  corresponding  upper  bound  for  the  rate  of  convergence  of  the 
we  have  established  (1.2)  for  the  symmetric  V-cycle. 


Remark.  Since  the  quantities  pk  determined  by  (6.2)  are  upper  bounds  for  the  actual 
convergence  factors  pk  ,  we  have  not  emphasized  lower  bounds  for  these  quantities. 
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However,  for  the  sake  of  completeness,  we  mention  that  it  is  easy  to  obtain  a  constant 
C2  >  0  for  which  one  has 


(Pi)2  >  (Pt-i)2  +  C2{  1  -  (pjt-j)2]^  . 


Then,  using  Lemma  2  one  obtains  that  there  is  a  constant  f>2  >  0  such  that 

(Pi)2  >  1  -  . 
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ABSTRACT 

The  flux-splitting  method  is  applied  to  the  convective  part  of  the  steady 
Navier-Stokes  equations,  for  incompressible  flow.  Partial  upwind  differences 
are  introduced  in  this  split  first  order  part  while  central  differences  are 
used  in  the  second  order  part.  The  set  of  discrete  equations,  obtained  in  this 
way  has  the  property  of  positiveness,  so  that  it  can  be  solved  by  collective 
variants  of  relaxation  methods. 

It  is  shown  that,  with  the  use  of  an  optimum  partial  upwinding,  accurate 
results  can  be  obtained  and  that  the  relaxation  method  can  be  brought  into 
multigrid  form. 

1.  INTRODUCTION 

The  flux-vector  splitting  method  was  introduced  by  Steger  and  Warming  [ll  to 
solve  unsteady  Euler  equations.  Further,  it  was  shown  by  Jespersen  [2]  that 
the  flux-vector  splitting  method  can  also  be  used  on  the  steady  Euler  equa¬ 
tions,  to  generate  discrete  equations  which  form  a  positive  set  so  that  a 
solution  by  relaxation  methods,  in  multigrid  form,  is  possible. 

In  this  paper,  the  flux-vector  splitting  method  is  applied  to  the  convec¬ 
tive  (i.e.  Euler-)  part  of  the  Navier-Stokes  equations  for  incompressible 
flow.  The  fundamentals  of  this  flux-vector  splitting  method  were  already  out¬ 
lined  by  the  author  in  [3] .  It  was  shown  there  that  accurate  results  can  be 
obtained.  In  this  paper,  a  multigrid  version  of  the  algorithm  is  described. 
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2.  UPWIND  DIFFERENCING  FOR  SYSTEMS  OF  EQUATIONS 

A  system  of  steady  convective  equations  (e.g.  Euler  equations),  can  be  written 
in  linearized  form  as  : 


+Bf  =0 

dx  dy 


(1) 


where  E,  is  the  vector  of  dependent  variables. 

For  a  convective  system,  the  matrices  A  and  B  have  real  eigenvalues  and  a 
complete  set  of  eigenvectors.  As  a  consequence,  it  is  always  possible  to  split 
the  matrices  into  a  sum  of  a  matrix  with  positive  eigenvalues  and  a  matrix 
with  negative  eigenvalues  : 

A  =  A+  +  A"  B  =  B+  +  B” 

Equation  (1)  then  can  be  written  in  split  form  as  : 


A+  +  A-  a  +  B+  +  B-  O  -  o 
dx  3x  dy  dy 


(2) 


An  upwind  discretization  of  (2)  then  is  obtained  when  the  +terms  are  discre¬ 
tized  by  backward  differences  and  the  -terms  by  forward  differences.  For 
example,  on  a  regular  grid,  this  is  : 

a+(£.  ,  .)  +  A-(C.^,  .)  +  B+(£.  ■  .)  +B"(£.  .  .)  =  0 

i»l  1-1,3  1+1,3  i,j  1,3  1,3-1  1,3+1  1,3 


or  (A++B+-A--B-)Ct/.  =  A\_lfj+  (-A')ei+lrj  + 


(3) 


Although  it  is  not  a  general  rule,  clearly  for  a  large  class  of  systems,  the 
coefficient  matrix  C  =  A+  +  B+  -  A  -B  has  positive  eigenvalues.  In  this 
case,  the  system  of  equations  (3)  forms  a  positive  set  of  equations,  all  ma¬ 
trix  coefficients  having  non-negative  eigenvalues.  It  is  clear  that  collective 
variants  of  relaxation  methods  can  be  used  on  positive  sets  of  equations. 

A  systematic  way  to  split  the  matrices  A  and  B  in  (1)  is  the  flux-vector 
splitting  technique  of  Steger  and  Warming  [1]  ,  based  on  the  splitting  of  the 
eigenvalue  matrices. 

By  denoting  the  eigenvalue  matrices  of  A  and  B  by  and  Ag  and  the  left 
eigenvector  matrices  by  Xft  and  Xg,  obviously  : 


A  =  Xa\X* 

A  A  A 

The  eigenvalue  matrices  can  be  split  into 


B  =  sVe 


A  =  A  +  Ak 

AAA 

where  : 


~  AB  +  AB 


AA  =  diag(XiA) 


Aa  =  diag(\.A) 


X . ,  =  max (X .  „  ,  0) 
lA  lA 


X  , ,  =  min (X . , , 
iA  lA 


0) 
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The  split  matrices  then  are  obtained  by  : 


A+  =  X?1A+X. 
A  A  A 


a"  =  ‘a:x. 

A  A  A 


B+  = 


B_  =  xbJAbxb 


3.  FLUX- VECTOR  SPLITTING  FOR  STEADY  NAVIER-STOKES  EQUATIONS 

The  steady  Navier-Stokes  equations  for  an  incompressible  fluid  are  : 

3u  3u  ,  3p  ..  ,32u  32U' 

3x  3y  3x  3x2  dy2 

u  and  v  are  the  Cartesian  components  of  velocity,  c  is  a  reference  velocity 
introduced  to  homogenize  the  eigenvalues  of  the  system  matrices,  v  is  kinema¬ 
tic  viscosity  and  p  is  pressure  divided  by  density. 

In  system  form,  the  set  of  equations  (4)  becomes  : 

fu  0  l]  fu]  (v  0  O')  fu]  fv  0  O')  fu] 


c2  0 


U  0  -r—  V  +  0 

3x 


V 

1  T~ 
3y 

c2 

0 

11  = 

d(M 

3y 

or  symbolically  s  A3t  +  BaJ  =  D('^T+3^) 

The  eigenvalues  of  the  system  matrices  A  and  B  are  : 


xia  =  u 

A  -  H-Jt 
A2A  " 

/u2+4c2 

2 

X  s  \r 

A  - 

/v2+4c2 

aib  v 

A2B  " 

2 

and  A2B  are  always  positive 

and  Ajb  change 

sign  with 

u  and  v. 

X3B  = 


u  -  /u2+4c2 
2 

v  -  /v2+4c2 


AA  =  dia9(u+'  ^2A'  0) 


Aft  =  diagfu  ,  0,  X3ft) 


Ag  =  diag (v+,  A 20,  0) 


Ag  =  diag(v  ,  0,  A^) 


u  =  max(u,0) 


u  =  min (u,0) 


v  =  max(v,0) 


v  =  min (v,0) 


According  to  the  procedure  of  Steger  and  Warming,  the  split  matrices  become  : 


—  1  .  + 

K.  A.xa 

A  A  A 


a+ 

0 

ai 

0 

0 

0  a2 

0 

+ 

u 

A  =  X'1A'XA  = 

A  A  A 

0 

u  0 

Q^C2 

0 

a. 

ia2c2 

0  -a 

V. . 
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< 

+ 

o 

o 

'o 

o 

1 

> 

B+  =  = 

0  o+ 

B"  =  = 

0  v  b2 

0  61c2  b 

0  62c2  -b 

where 

with 

a  = 


0+  =  ctjU  +  a 
ctj  =  .5(l+a) 

c2//u2+4c2  b 


a  =  ct2u  -  a 
a2  =  .5(l-a) 

c2// v2+4c2 


v  =  3jV  +  b 

6j  =  .5(1+6) 

=  u//u2+4c2 


v  =  62v  -  b 

62  =  .5(1-6) 


B  =  v//v2+4c2 


The  split  form  of  the  system  (6)  becomes  : 


1£ 

9x 


li 

9x 


i£ 

8y 


li 

3y 


)2‘ 

'3x‘ 


,1^  +  1% 
+  3y2 


(7) 


On  a  rectangular  grid,  using  upwind  differences  in  the  first  order  part  and 
central  differences  in  the  second  order  part,  a  positive  set  of  equations  is 
obtained. 

However,  since  the  momentum  equations  in  (7)  have  terms  in  the  velocity 
differences  from  the  convective  part  and  the  diffusive  part,  a  partial  upwind 
formulation  is  possible  for  these  equations,  retaining  the  positiveness. 

For  example,  the  moraentum-x  equation  can  be  discretized  as  : 


a  10  6  u  +  (1-0  )  6  uj  +  Q  10  6  u  +  (l-0v  )  6  uj 

V  XX  X  XX  X  1  1  XX  X  XX  X  2 

+  v+(0  6+u  +  (1-6  )  6  u]  +  v  fd  6  u  +  (1-6  )6  u) 

''xyy  xy  y  '  *Y  Y  xy  y  ' 

+  aiV  +  0t2^xP  =  v(6xu  +  AyU^ 


where  6+u  =  (u.  .  -  u.  ,  .  )/Ax 

x  i,j  i-l»l  w 


6  u  =  (u 
x  1+1,1 


-  u .  . ) /Ax 
i,l  e 


Ax  Ax 

6XU  =  2A^  6xU  +  2A^  6xU 

6yU  =  (ui,i  -  Ui,j-1)/Ays  V  =  (Ui,j+l  "  Ui,j)/Ayn 


y 

Ay  ,  Ay 

n  -+  ,  1  s  -- 

0  U  +  -t-7—  o  u 

y  2  Ay  y 


,  n 

0  u  =  -TT—  6  1 

y  2  Ay  y 


62u 

=  (6+u  - 

6  u)/Ax 

62u  = 

(<$*u  - 

6  u) /Ay 

X 

X 

X 

y 

y 

y 

Ax 

=  X.  - 

X.  , 

Ax  =  x_,  . 

-  X  .  . 

Ax  = 

.5 (Ax  +  Ax  , 

w 

1,1 

e  i+l,] 

1,1 

w  e 

AyS 

=  Yi.i  ■ 

yi.i-l 

Ayn=yi,j+1 

■  yi,j 

Ay  = 

.5 (Ay  +  Ay 
s  n 

with 
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A  similar  discretization  can  be  used  on  the  momentum-y  equation,  involving 
0yx  and  0  .  The  mass-equation  in  (7)  is  to  be  discretized  in  a  full  upwind 

way. 

The  optimum  values  for  the  partial  upwind  coefficients  9  ,6  ,9,0 

xx  xy  yx  yy 

can  be  determined  by  expressing  that  the  linearised  form  of  the  discrete 
equations,  i.e.  coefficients  like  u+,  u  ,  v+,  v  ,  ...,  considered  as  being 


constant,  is  of  the  form  : 


ux/v  vy/v 
e  e 

In  this  way,  the  optimum  value  of  0^  is  found  to  be  given  by 

u(Ax  o  +  Ax  a  ) 

w  e _ e  w  ^ 

0-0 

0  - e — * - 

XX  _+  .  . 

u  Ax  -  u  Ax 
w  e 


1  -  e_uAx"/V 


euAxe/V  _  1 


Similar  expressions  are  found  for  0  ,  9  and  9 

xy  yx  yy 

As  is  common  practice  for  scalar  equations,  the  expressions  for  the  opti¬ 
mum  partial  upwind  coefficients  can  be  replaced  by  their  expansions  for  small 
values  of  velocity,  with  a  maximum  of  1.  Expansion  of  (8)  leads  to  : 

9xx  =  min{ (Pexx/6) ,1}  (9) 


where  the  Pec let- number  is  : 


4.  NUMERICAL  EXAMPLE 


(u  Ax  -  u  Ax  )/v 
w  e 


Figure  1  shows  a  well  known  backward  facing  step  problem  from  [4]  ,  discretized 
with  a  coarse  grid  with  42  elements.  This  grid  is  the  coarsest  of  a  series  of 
four,  the  finest  grid  having  2688  elements.  In  the  construction  of  finer  grids 
the  same  stretching  law  is  applied,  as  used  in  the  coarse  grid. 

The  following  boundary  conditions  are  imposed.  At  inlet  :  u  =  u  (y) ,  in  which 
UQ(y)  is  a  parabolic  profile  with  a  mean  velocity  c,  v  =  0  and  p  from  a  combi- 


FIG.  1.  Backward  facing  step  problem  discretized  with  a  coarse  grid 


.  .  i|  Map 
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nation  of  the  momentum-x  equation  and  the  mass  equation  so  that  in  the  convec¬ 
tive  part  derivatives  in  upstream  direction  are  eliminated.  This  equation  is 
further  simplified  by  an  assumption  of  fully  developed  flow  upstream  of  the 
inlet  section.  The  result  is  : 


. 5(u  -  /u2+4c2 )  S^u  +  6^p  =  v  62u 


At  outlet  :  p  =  0/  v  =  0  and  u  from  a  combination  of  the  momentum-x  equation 
and  the  mass  equation  so  that  in  the  convective  part,  derivatives  in  down¬ 
stream  direction  are  eliminated.  Again  with  an  assumption  of  fully  developed 
flow  downstream  of  the  outlet  section,  the  result  is  : 

•5(u  +  /u2+4c2)  6+u  +  6+p  =  v  62u 
xx  y 

At  solid  boundaries  :  u  =  0,  v  =  0  and  p  from  a  combination  of  the  mass  equa¬ 
tion  and  the  momentum  equations,  so  that  derivatives  in  the  outgoing  direction 
are  eliminated.  For  instance  at  the  horizontal  part  of  the  bottom  boundary, 
this  gives  : 

6^p  '5>-2«Jp  =  -2v  62v 

where  62v  is  the  second  derivative,  calculated  inward,  taking  into  account 
that  8  v  =  0. 

y 

A  similar  expression  holds  on  vertical  parts  of  the  boundary.  In  the 

corner  points,  the  mean  value  obtained  from  both  expressions  is  used. 

Figure  2  shows  the  solution  obtained  with  a  successive  underrelaxation 

method  (relaxation  factor  0.9S)  in  red-black  ordering  for 

Re  =  U  h/v  =  150 
max 

where  U  is  the  maximum  value  of  the  velocity  at  the  inlet  section  and 
max 

where  h  is  the  step  height.  The  streamlines  shown  in  figure  2  were  obtained 
by  integration  of  the  calculated  velocity  profiles.  The  reattachment  length 
to  step  height  ratio  is  about  6.  This  result  is  in  accordance  with  the  experi¬ 
mental  value  [4]  . 


FIG.  2.  Streamline  pattern  for  the  backward  facing  step  problem,  obtained  at 
the  finest  grid 
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5.  MULTIGRID  FORMULATION 

All  equations  are  normalized  by  bringing  the  coefficient  of  u,  v  and  p  in  the 
central  node,  for  the  momentum-x,  momentum-y  and  pressure  (mass)  equation  res¬ 
pectively,  on  the  value  1.  As  a  result,  field  equations  and  boundary  equations 
take  a  similar  form.  This  allows  the  use  of  full  weighting  as  restriction  ope¬ 
rator  for  defects  at  the  boundaries,  taking  a  weighted  mean  of  the  defects  of 
boundary  equations  and  field  equations. 

Successive  underrelaxation  in  red-black  form  was  chosen  as  relaxation 
algorithm.  For  a  system  of  first  order  equations  the  maximum  relaxation  factor 
for  stability  is  1  (not  2) .  Maximum  convergence  rate  for  a  single  grid  calcu¬ 
lation  was  found  to  be  obtained  for  a  relaxation  factor  0.95.  Although  it  is 
well  known  that  red-black  relaxation  does  not  have  optimum  smoothing  proper¬ 
ties,  this  algorithm  was  chosen  for  its  ease  in  vectorizing  the  code. 

A  full  approximation  scheme  was  used.  For  the  restriction  operator,  expe¬ 
riments  were  done  with  full  weighting,  both  on  function  values  and  on  defects, 
injection  for  function  values  and  full  weighting  for  defects  and  injection  for 
both.  In  the  full  weighting  versions,  experiments  were  done  with  full  weight¬ 
ing  at  the  boundaries  and  with  weighting  restricted  to  boundary  points.  Bili¬ 
near  interpolation  was  used  as  prolongation  operator. 

The  cycle  configuration  was  chosen  to  be  the  V-cycle.  Nested  iteration 
was  not  used  as  starting  cycle.  Similar  to  the  case  of  linear  systems  of 
equations  [5]  ,  it  was  found  to  be  beneficial  to  increase  the  number  of  itera¬ 
tions  with  the  coarseness  of  the  grid.  The  optimum  number  of  pre-  and  post¬ 
relaxations  was  found  to  be 

h  2h  4h  8h  4h  2h  h 

V  :  1  2  3  8  2  1  0 

The  best  efficiency  of  the  multigrid  cycle  was  found  to  be  reached  for  a 
relaxation  factor  u)  =  0.85. 

It  was  found  that  the  performance  is  rather  insensitive  to  the  choice  of 
the  restriction  operator.  Using  full  weighting  for  defects  is  slightly  more 
efficient  than  using  injection,  in  terms  of  required  number  of  cycles.  Since 
however  injection  requires  less  residue  evaluations,  in  terms  of  work  units 
the  performance  is  about  the  same.  The  performance  is  also  not  sensitive  to 
the  precise  weighting  formula  :  algebraic  weighting  (i.e.  weighting  factors 
1/2,  1/4,  1/8)  or  geometric  weighting  (i.e.  weighting  factors  taking  into 
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FIG.  3.  Convergence  history  for  single  grid  (u>  =  0.95)  and  multigrid  (oo=0.85) 
red-black  relaxation,  using  injection  as  restriction  operator 


account  the  distances  between  nodes).  Therefore,  due  to  its  simplicity  injec¬ 
tion  both  for  functions  and  for  defects  was  retained  for  further  use. 

Figure  3  shows  the  convergence  history  for  a  single  grid  calculation  and 
a  multigrid  calculation.  The  initial  condition  is  a  flow  with  v  =  0  and  p  =  0 
everywhere  and  with  u  equal  to  the  inlet  profile  in  the  upper  part  of  the 
flowfield  and  u  =  0  in  the  lower  part  of  the  flowfield.  In  the  evaluation  of 
the  work  of  a  cycle,  on  the  finest  grid,  a  relaxation  and  a  residue  calcula¬ 
tion  with  the  associated  grid  transfer  are  counted  as  one  work  unit.  The  work 
of  one  cycle  is  : 


Vh  +  t  (2  + 


v2h>  +  Te  (2  + 


V4h> 


64 


(2  +  V8h> 


2.85 


6.  CONCLUSION 

It  was  shown  that  the  flux-vector  splitting  technique  can  be  applied  to  steady 
Navier-Stokes  equations  in  incompressible  flow,  leading  to  discrete  equations 
which  can  be  solved  by  vector  variants  of  relaxation  schemes  in  multigrid 
form. 
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ABSTRACT:  Parallel  algorithms  are  developed  in  the  setting  of  iterative  multilevel  methods. 
The  constituent  parts  of  the  algorithms  are  dependent  rather  than  independent  as  in 
conventional  parallel  algorithms.  We  develop  cases  and  conditions  wherein  this  dependence 
generates  a  constructive  interference  in  the  computation.  The  resulting  parallel  algorithms  can 
then  be  more  efficient  than  serial  counterparts.  Unlike  standard  parallel  versions  of  multilevel 
algorithms,  our  algorithms  are  nontelescoping.  Ilence,  all  processors  which  can  be  be 
effectively  used  on  the  finest  level  can  also  be  used  on  the  coarser  levels  in  a  natural  fashion. 


1.  INTRODUCTION 

There  are  a  number  of  ways  of  parallelizing  multilevel  algorithms.  In  [ 4  j ,  performing  each 
operation  in  parallel  a  single  level  at  a  time  is  suggested.  This  is  a  simple  approach  to  parallel 
computation,  but  can  be  made  to  work  with  a  shuffle  communication  technique  between 
processors.  In  [  1 2 j ,  computing  on  all  the  levels  in  parallel  as  well  as  doing  the  operations  per 
level  in  parallel  is  suggested.  This  leads  to  an  algorithm  which  requires  very  large  amounts  of 
data  transfers  between  processors  to  maintain  stability.  Our  algorithms  are  based  on  the  idea 
that  all  processors  which  can  be  be  effectively  used  on  the  finest  level  should  be  usable  on  the 
coarser  levels  as  well  (i.e.,  they  are  nontelescoping).  Further,  not  much  information  should  be 
transmitted  between  levels. 

These  as  well  as  (seemingly)  all  parallel  algorithms  depend  on  finding  independent  parts 
of  serial  algorithms;  these  parts  are  performable  simultaneously  on  separate  processors.  Viewed 
in  this  way,  parallelization  of  algorithms  is  basically  a  combinatorial  or  data  flow  problem.  One 
side  effect  of  this  approach  is  the  reduction  of  computational  efficiency  of  parallel  algorithms. 

Multilevel  iterative  algorithms  suggest  another  approach  to  parallelization  where  depen¬ 
dence  rather  than  independence  of  the  constituent  parallel  parts  is  the  desirable  property. 

These  constituent  parts  are  smaller,  cheaper  to  perform  versions  of  the  original  problem. 
Because  of  the  dependence,  the  simultaneous  computation  and  interaction  in  the  iteration  may 
be  viewed  as  setting  up  an  interference  between  computations  being  performed  in  the  consti¬ 
tuent  parts.  The  point  is  that  for  appropriate  problems  this  interference  is  constructive  resulting 
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in  efficiencies  capable  of  exceeding  those  of  the  serial  counterparts.  As  an  extreme  of  this 
phenomenon,  there  are  cases  in  which  the  methods  are  direct  methods  instead  of  iterative,  i.e., 
convergence  occurs  in  one  iteration.  We  give  details  of  this  in  Section  2  and  discuss  implemen¬ 
tation  issues  in  Section  3. 


2.  PARALLEL  ALGORITHMS  AND  ONE  ITERATION  CONVERGENCE  CRITERIA 

In  this  section,  we  develop  parallel  algorithms  which  exhibit  the  constructive  interference  con¬ 
cept.  We  are  principally  interested  in  two  level  algorithms  since  they  are  easier  to  motivate  and 
understand.  However,  we  do  define  algorithms  which  have  more  than  two  levels.  We  conclude 
this  section  by  stating  some  theorems  which  give  conditions  under  which  these  algorithms  con¬ 
verge  in  one  iteration. 

We  illustrate  the  constructive  interference  concept  using  a  standard  two  level  algorithm. 
Assume  N  >  1,  A€|RN’tN  has  full  rank,  and  b£|RN.  We  seek  the  unique  x*€|RN  satisfying 

Ax*=b.  (2.1) 

Given  an  approximation  x  to  x*,  we  determine  a  correction  c  to  x  by  solving  a  system  of  size 
p  <  N;  the  so-called  aggregated  problem.  We  construct  full  rank  (rank  p  in  this  case)  matrices 
R  and  P  satisfying 

R:  pN  —  (Rp  and  P:  (Rp  —  |RN. 

R  and  P  are  called  restriction  and  prolongation  matrices.  Then  the  aggregated  problem  is 

(RAP)c  =g,  (2.2) 

for  some  g€[Rp.  We  assume  that  the  aggregated  matrix  RAP  is  nonsingular.  Instead  of  (2.2), 
the  aggregation/disaggregation  technique  would  use  the  problem 

(nAn)c  =g,  n  =  PR, 

for  some  g€ (RN. 

For  certain  classes  of  matrices  A,  we  can  solve  (2.1)  using  a  standard  two  level  algorithm 
composed  of  a  smoothing  part  and  a  correction  part: 

Algorithm  MG(z,  b,  m,  k): 

|z  =  initial  guess  and  approximate  solution 
|b  =  right  hand  side 
|m  =  number  of  smoothing  iterations 
|k  =  number  of  correction  iterations  used 

(1)  Smooth  m  times  on  z  to  getXj. 

(2)  Do  i  =  1,  2,  ...  ,  k: 

( a)  Set  r j  =  b  -  Ax;. 

(b)  Solve  (RAP)c  =  Rrj. 

(c)  Set  xi+1/2  =  X;  +  Pc. 

(d)  Smooth  m  times  on  xj+i/2  to  getxi+I. 

(3)  Setz  =  xk+1. 

The  smoothing  step  (2d)  is  usually  a  scaled  iterative  method.  Typical  smoothers  used  in  prac¬ 
tice  are  based  on  relaxation,  incomplete  factorization  methods,  and  Krylov  space  methods  (e.g., 
conjugate  gradients).  This  algorithm  can  be  extended  to  include  any  number  of  levels  as  well 
as  a  parameter  governing  the  number  of  correction  iterations  on  each  level.  Analysis  of  this 
algorithm  for  two  or  more  levels  can  be  found  in  [1,  3,  7,  8,  13,  15,  16,  17,  18) . 
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We  propose  parallel  algorithms  based  on  the  concept  that  while  the  standard  correction 
step  is  being  computed  in  jRp,  other  computations  in  p.N~p  replacing  the  smoothing  step  can  go 
on.  In  particular,  if  the  cost  of  these  two  steps  is  roughly  the  same,  we  should  be  able  to  do 
both  simultaneously,  combine  the  p- vector  and  (N-  p)-vector  appropriately,  and  repeat  the 
iteration.  A  second  and  extremely  important  consideration  is  keeping  the  amount  of  data 
transfer  between  processors  to  a  minimum. 

Given  positive  integers  ps,  i  =  1,  ...  ,  j,  letR;  and  Pi  be  full  rank  matrices  where 
Rp  ^lN  -  JRP‘,  Pi:  fRP|  -  |Rn,  and  let  £Pi  =N. 

i=s 

For  the  aggregation/disaggregation  formulation,  we  require 
rij  =PjRi,  where  R;P;  =1,  i  =1,  •  ••  ,  j, 
and  the  projections  fl;  to  be  mutually  orthogonal. 

Algorithm  MG  is  transformed  into 

Algorithm  PMG(z,  b,  k,  {Pi},  {R;},  j): 

|z  —  initial  guess  and  approximate  solution 

|b  =  right  hand  side 

)k  =  number  of  iterations 

|  {Pi}  =  set  of  prolongation  matrices 

KRi)  =  set  of  restriction  matrices 

|j  =  number  of  restriction/prolongation  matrices 

(l)Setxi-z.  (2)  Do  i  =  1,  2,  ...  ,  k: 

(a)  Set  r(  —  b  -  Ax;. 

(b)  Solve  in  parallel  (RmAPm}cm  =Rmri,  m  =  1,  ...  ,  j. 

(c)  Set  x-1+i  ==  Xj  +  P,c,  +■  ... -f  PjCj.  (3)  Set  z  =  xk+]. 

Tlie  aggregation/disaggregation  formulation  of  Algorithm  PMG  is  simply 

Algorithm  PAD(z,  b,  k,  {11;},  j): 

|z  =  initial  guess  and  approximate  solution 

|b  =  right  hand  side 

|k  =  number  of  iterations 

|  {IT,}  =  set  of  projection  matrices 

|j  =  number  of  restriction/prolongation  matrices 

( 1)  Set  x(  —  z. 

(2)  Do  i  —  1,  2,  ...  ,  k: 

{ a)  Set  rs  ==  b  -  Axj. 

(b)  Solve  in  parallel  (nmAnm)cm  =nmri,  m  =  1,  ...  ,  j. 

(c)  Set  Xi+1  =  X;  +  C(  +  ...  +  Cj. 

(3)  Set  z  =  xk+|. 

All  of  the  cm  determined  at  iteration  i  influence  all  of  the  cm  to  be  determined  at  iteration  i+1. 
Tltat  is,  the  corrections  interact  or  interfere  from  step  to  step  in  the  iteration.  This  interference 
constitutes  a  propagation  of  the  information  (a  well  known  feature  of  matrix  iterative  methods). 

Remark:  The  coarse  level  correction  step  of  Algorithm  MG  is  sometimes  viewed  as  a  precondi¬ 
tioning  step  to  the  iterative  method  referred  to  as  the  smoother.  Likewise,  the  serial  imple¬ 
mentation  of  Algorithms  PMG  and  PAD  may  be  viewed  as  traditional  block  relaxation 
methods,  where  the  blocks  are  RnlAPm  with  right-hand  side  sections  Rmrj.  For  problems 
derived  from  grids  (e.g.,  partial  differential  equation  problems),  our  algorithms  viewed  as  grid 
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algorithms  reveal  more  structure  than  as  abstract  linear  algebra  algorithms. 

If  no  good  initial  guess  for  Algorithm  PMG  is  available,  a  parallel  version  of  the  nested 
iteration  version  of  multilevel  algorithms  is  of  value: 

Algorithm  PNI(z,  b,  k,  {Ps },  {Rj},  j): 

(z  =  initial  guess  and  approximate  solution 

jb  =  right  hand  side 

|k  =  number  of  iterations 

|  {P^  =  set  of  prolongation  matrices 

|  {Rj}  =  set  of  restriction  matrices 

|j  =  number  of  restriction/prolongation  matrices 

( 1)  Set  ^  =  b  -  Az. 

(2)  Solve  in  parallel  (RmAPm)cm  =  Rmri,  m  =  1,  ...  ,  j. 

(3)  Set  z  =  z  +  P,c,  +  ...  -t-  PjCj. 

(4)  Call  Algorithm  PMG(z,  b,  k,  {Pi},  {R,},  j) 

The  corresponding  formulation  for  aggregation/disaggregation,  Algorithm  PADNI,  is  obvious. 
Analysis  of  the  serial  form  of  this  algorithm  for  two  or  more  levels  can  be  found  in  [2,  7,  8, 
13], 

Remark:  At  face  value,  Algorithms  PNI  and  PADNI  may  seem  unnecessary.  We  can  prove 
that  k+  1  iterations  of  Algorithm  PMG  (PAD)  starting  from  a  zero  initial  guess  is  equivalent  to 
k  iterations  of  Algorithm  PNI  (PADNI).  However,  Algorithm  PNI  (PADNI)  saves  the  cost  of 
computing  a  residual.  While  the  cost  is  slight  in  the  two  level  case  (particularly  for  problems 
arising  from  discretizing  elliptic  partial  differential  equations),  it  adds  up  when  there  are  many 
levels  and  many  aggregation  problems  per  level. 

These  algorithms  can  be  extended  to  any  number  of  levels  resulting  in  a  tree  structure  of 
problems  to  solve.  In  each  case  we  prepend  the  letter  G  to  a  two  level  algorithm  name  to 
denote  the  generalized  form  of  the  algorithm.  We  begin  by  defining 

Aq  =  A. 

We  sequence  the  aggregated  problems  into  a  tree  representation: 

A0 

1  I 

A,  ...  Aj, 

j  ...  J  ...  1  ...  } 

Associated  with  any  matrix  A,  is  a  sequence  of  jq  >  0  projection  matrices  {Ilqm}.  When  j,>0, 
the  j,  aggregation  problems  associated  with  Aq  are 

Hq,mAqriqimCqirn  —  nqtIT,rq,  1  <  m  5c  Jq* 

When  jq  =  0,  there  are  no  further  aggregation  problems  to  solve  in  this  part  of  the  tree  (i.e., 
we  are  at  a  local  coarsest  level).  We  solve  the  problem 

Aqz  =b, 

usually  directly.  Further,  we  can  substitute  calls  to  our  new  algorithm  for  step  (2b)  of  Algo¬ 
rithm  PMG  with  zero  as  the  initial  guess  for  cm.  This  leads  us  to  the  following  generalization 
of  Algorithm  PMG: 
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Algorithm  GPMG(z,  b,  k,  j,  q): 

|z  =  initial  guess  and  approximate  solution 

|  b  =  right  hand  side 

)k  =  number  of  iterations 

|j  =  number  of  projection  matrices 

|q  =  aggregated  matrix  identification  number 

(1)  If  j  =  0,  then  solve  Aqz  =  b  (usually  directly).  (2)  If  j  >  0,  then 

(a)  Set  Xi  =  z. 

(b)  Do  i  =  1,  2,  ...  ,  k: 

(i)  Set  ri  =  b  -  Aqx;. 

(ii)  Solve  in  parallel  (nqin)Aqnqiin)cq  m  =nqmri,  m  =  1,  ...  ,  jq,  starting  from 
an  initial  guess  cm  =  0  by  an  appropriate  call  to  Algorithm  GPMG. 

(iii)  Set  xj+1  =X;  4-  cqi!  +  •  •  •  +  cqj. 

(c)  Set  z  =  xk+1. 

The  cases  k  =  1  and  k  =  2  correspond  to  the  familiar  V  and  W  cycles  of  multigrid.  Note  that 
this  is  a  nontelescoping  algorithm  in  the  sense  that  the  total  number  of  unknowns  on  every 
level  is  the  same.  Hence,  all  processors  which  can  be  be  effectively  used  on  the  finest  level  can 
also  be  used  on  all  of  the  coarser  levels  in  a  natural  fashion. 

Serial  multilevel  algorithms  are  iterative.  Our  algorithms  can  be  shown  to  be  direct 
methods  under  certain  circumstances.  We  use  the  commutator  notation  (Lie  bracket) 

[A,B|  =AB  -  BA. 

Let  e;  be  the  error  at  iteration  i.  Then,  the  error  propagator  matrix,  CpMG.  lor  Algorithm  PMG 
is  defined  by 

ei+i  —  CpMG«i. 

where 

Cpmg  -  (  S(nmAnm)+nm)A 

m=l 

(see  [10]  for  the  derivation).  Similarly,  the  error  propagator  matrix,  Cpmg>  for  Algorithm  PAD 
can  be  derived  as 

cPAD  =  E{(i-nmA)-1-iJ(i-nm). 

Based  on  the  remark  after  the  definition  of  Algorithm  PNI,  we  can  prove  that 
(CpNl)k+l  =(CpMG)k  an(f  (C'PADNi)1'*1  —  (CpAD)kr 
assuming  a  zero  initial  guess. 

We  can  prove  several  single  iteration  convergence  theorems.  We  use  this  terminology 
since  the  correction  algorithms  (PMG  and  PAD)  use  k  ==  1  and  the  nested  iteration  algorithms 
(PNI  and  PADNI)  use  k  =  0. 

THEOREM  1:  Let  £lli  =  I.  Then  Algorithms  PMG(k=l),  PAD(k=l),  PNI(k=0),  and 

i=l 

PADNI(k=0)  converge  to  the  exact  solution  in  one  iteration  (assuming  infinite  precision  arith¬ 
metic)  if  [ni;A]  =  0  for  all  1  <  i  <  j. 

The  proofs  are  based  on  showing  that 

Cpad  =E(i-nlB)-,|n„B](nri)  (2.3) 

i~I 
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and  that 

CpM0=EPi(RiAPi)-1Ri!ni)A].  (2.4) 

i=t 

COROLLARY  2:  Theorem  I  remains  true  for  Algorithms  PMG  and  PNI  if  the  condition 
[11,, A]  =  0  is  replaced  by  njA(  11;—  I)  =  0  for  all  i,  1  <  i  <  j. 

COROLLARY  3:  For  1  <  i  <  jq,  let  the  {pqi}  and  {Rq  i,Pq  i}  associated  with  each  Aq  satisfy 
that  each  Rq  jPq  ;  =  I  and 

Rq,i:  |RN  -  JRPq-‘,  Pq,i:  -  |RN,  £p*i  =N-  “d  E^.i 

i=i  i=i 

Then  Algorithms  GPMG(k=l),  GPAD(k=l),  GNI(k=0),  and  GPADNI(k=0)  converge  to 
the  exact  solution  in  one  iteration  (assuming  infinite  precision  arithmetic)  if  for  every  q, 
[nq,i,Aq]  =  0  for  all  i,  1  <  i  <  j,. 

The  proofs  can  be  found  in  [9,  10] .  The  proofs  of  these  theorems  use  the  fact  that  each  of  the 
products  RP  =  I  (ignoring  subscripts).  In  fact,  this  requirement  can  be  weakened,  requiring 
only  that  each  product  RP  be  invertible. 

Quantitative  results  on  the  rate  of  convergence  of  the  algorithms  may  be  composed  in 
terms  of  the  norms  of  the  commutators  appearing  in  (2.3)  and  (2.4).  The  interference  (or 
influence)  of  the  aggregate  problems  on  all  of  the  other  aggregate  problems  is  diminished  as  the 
size  of  these  norms  decreases.  However,  the  interference  is  more  constructive  when  the  norms 
are  small. 


3.  IMPLEMENTATION  ISSUES 

There  are  several  models  of  parallel  computers.  We  discuss  only  fine  grain  and  coarse  grain 
parallel  computers.  A  fine  grain  computer  is  one  with  a  massive  number  of  processors 
(thousands  or  millions).  Each  processor  is  usually  of  low  computational  power  with  the  overall 
speed  of  the  machine  being  determined  by  sheer  quantity.  Typically,  these  are  single  instruc¬ 
tion,  multiple  data(SIMD)  machines.  A  coarse  grain  computer  is  one  with  a  small  number  of 
processors  (a  few  dozen  or  less).  Each  processor  is  anything  from  a  microprocessor  to  a  super¬ 
computer  class  computer.  Typically,  these  are  multiple  instruction,  multiple  data  (MIMD) 
machines. 

First,  consider  the  coarse  grain  computer  model.  We  will  give  a  rough  outline  of  how  to 
implement  these  algorithms  and  then  give  a  specific  example.  On  machines  with  local 
memories,  it  is  usually  more  efficient  to  distribute  pieces  of  the  various  matrices  to  local 
memories.  A  careful  implementation  can  allow  this  to  go  on  simultaneously  with  the  beginning 
of  the  computation.  During  the  computation,  the  only  data  that  must  be  transmitted  between 
processors  are  the  pieces  of  vectors  (residuals  and  corrections)  which  cannot  be  computed  (or 
recomputed)  by  a  processor. 

On  a  coarse  grained  machine  with  shared  memory,  communication  time  is  not  a  concern. 
Rather,  data  must  be  distributed  in  such  a  way  that  memory  contention  is  avoided.  This  can  be 
dealt  with  in  a  similar  manner  as  the  nonshared  memory  case. 

We  illustrate  these  ideas  on  the  most  simple  form  of  a  coarse  grain  computer,  namely,  a 
two  processor  model.  Consider  the  two  dimensional  Poisson  equation: 

-  Au  —  f  in  Q  =  (0,l)x(0,l);  u  =0  on  dfi  ,  (3.1) 

where  f€L2(fl  ).  Setting  H  =  Hq(0  ),  where  H0'  is  the  usual  Sobolev  space  of  functions  which 
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vanish  on  dfl  ,  the  weak  form  of  (3.1)  is:  find  u€H  such  that  for  all  vGH, 


a(u,v)  =/nv  u-vydx  =(f,v)  =  /fifvdx. 

We  discretize  this  using  either  central  differences  on  a  uniform  mesh  or  finite  elements  on  a 
uniform  triangulation  using  C°  piecewise  linear  polynomials  and  the  usual  nodal  basis  functions. 
In  either  case,  after  discretization,  we  get  a  matrix  A  which  is  block  tridiagonal  [16): 


A=[-IfQ,-I], 


where  Q  =  [-  1,4,-  l]  is  a  \/Nx\/N  tridiagonal  matrix.  We  use  the  transpose  of  the  restriction 
matrices  to  be  the  corresponding  prolongation  matrices.  For  even  N,  we  define 


R.  = 


\/J  0 

|0  sfl 


0 

0 


0 

0 


o  yr 

v/dT  0 

0  0 


|-iN 

6fR2 


0  0 
0  0 


0  0 
0  0 


r2  = 


vC5  0  0  \TZ  0  0 

o  yir  yir  o  o  o 

o  -  yir  yi-  o  oo 

-yj  o  o  y)5  o  o 


6|R 


— »N 
2 


o  -yir  o  o 
ryir  o  oo 


0  0 
0  0 


yj  o 
o  y* 


For  the  case  N  odd,  an  additional  row  with  a  one  in  the  middle  column  is  added  to  the  bottom 
of  and  R2  has  a  zero  column  inserted  in  the  middle.  These  restriction  matrices  yield 


[A, III]  — [A,n2]  — 0. 


Theorem  1  predicts  Algorithms  PMG,  PAD,  PNI,  and  PADNI  will  converge  in  one  iteration, 
independent  or  N.  We  have  verified  this  in  practice. 

These  restriction  matrices  were  chosen  from  knowing  that  the  multigrid  analysis  for  (3.1) 
simplifies  to  the  study  of  N/4  4x4  error  propagation  matrices  (see  [BankDouglas85,7,8]).  R, 
was  chosen  to  match  the  pairing  of  eigencompcnents  from  the  known  analysis. 

Consider  either  Algorithm  PMG  or  PNI.  It  is  easy  to  see  that  one  way  of  minimizing 
communication  is  to  let  half  of  the  rows  of  A  and  half  of  the  fine  level  vectors  reside  in  each 
processor.  In  step  (2a),  each  processor  can  compute  half  of  the  residual  vector  locally  by 
transferring  only  \/N  data  elements  of  the  vector  Xj  (the  portions  at  the  middle  of  the  vector) 
from  the  other  processor.  In  step  (2b),  we  can  compute  the  portions  of  Rmr',  locally  for  the 
portions  of  r;  which  reside  locally  and  then  pass  information  to  the  other  processor.  Alter¬ 
nately,  we  can  swap  halves  of  ri  and  compute  each  Rmrj  locally.  How  we  compute  step  (2c)  is 
similar  to  step  (2b)  with  the  exception  that  we  probably  want  to  update  the  halves  of  xi+i  in  the 
appropriate  processor  (with  respect  to  the  discussion  about  step  (2a)).  To  perform  one  iteration 
of  Algorithm  PMG  requires  no  more  than  2(N+\/N)  real  words  being  communicated.  While 
this  is  not  optimal,  only  one  iteration  is  required  for  the  algorithm  to  converge. 

Now  consider  fine  grain  computers.  Most  parallel  multilevel  algorithms  require  either 
O(N)  (e.g.,  [4])  or  O(NlogN)  processors  (e.g.,  [12]).  Our  algorithms  are,  in  principle,  non¬ 
telescoping,  i.e . ,  the  sum  of  the  ranks  of  the  problems  on  each  level  is  always  N.  Hence,  we  do 
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not  have  to  resort  to  awkward  techniques  to  keep  all  of  the  processors  busy,  nor  do  we  have  to 
turn  off  a  growing  number  of  processors  as  computation  proceeds  on  coarser  and  coarser  levels. 
In  [14],  the  V  cycle  is  promoted  since  more  of  the  processors  are  kept  active  by  keeping  most 
of  the  computation  on  the  finer  levels.  Our  algorithms  can  do  most  of  their  computation  on 
the  coarser  levels  since  all  of  the  processors  are  kept  busy.  Further,  our  algorithms  only 
require  O(N)  processors. 

VVe  note  that  just  because  we  have  a  fine  grained  computer,  there  is  no  reason  to  assume 
that  we  are  required  to  solve  a  massive  number  of  coarse  level  problems.  Instead,  we  will 
probably  have  about  the  same  number  of  coarse  level  problems  as  before,  but  we  will  compute 
on  the  levels  differently.  For  example,  we  can  do  the  residual  computation  as  one  parallel 
operation  with  one  processor  allocated  to  each  unknown.  Communication,  in  effect.,  becomes 
an  exercise  in  minimizing  wire  lengths. 

On  coarse  grained  parallel  processors,  solving  the  coarsest  level  problems  directly  probably 
means  some  form  of  (sparse)  Gaussian  elimination.  On  fine  grained  computers,  this  probably 
means  using  a  fast  iterative  method,  such  as  a  preconditioned  conjugate  gradient-like  algorithm. 
Since  the  number  of  iterations  required  to  solve  a  problem  by  conjugate  gradients  (or  any  other 
matrix  iterative  algorithm)  is  dependent  on  the  size  of  the  matrix,  having  a  collection  of  small 
problems  reduces  the  overall  running  time. 

We  leave  the  details  of  which  matrices  should  be  stored  and  why  to  the  description  of 
MAD  PACK  (see  [5,  6]),  which  contains  a  serial  implementation  of  these  ideas  (along  with 
other  algorithms).  This  topic  is  highly  problem  dependent  and  not  suitable  to  this  paper.  For 
example,  one  extreme  is  Poisson’s  equation  on  a  rectangle  with  a  uniform  grid.  There  is  no 
reason  to  store  any  of  the  matrices  in  this  case.  On  the  other  hand,  certain  matrices  should  be 
stored  when  a  problem  is  on  a  very  complicated  grid  (with  a  nonuniform  mesh)  whose 
coefficients  are  the  solution  of  a  nonlinear  partial  differentia]  equation.  We  note  that  this  issue 
arises  in  standard  multigrid  implementations  with  similar  conclusions. 


4.  SUMMARY 

We  have  developed  parallel  algorithms  which  are  appropriate  for  both  coarse  and  fine  grained 
parallel  computers.  For  problems  arising  in  partial  differential  equations,  these  algorithms 
require  very  little  communication  between  processors.  Unlike  previous  multilevel  algorithms, 
there  are  no  smoothing  steps.  Instead,  multiple  coarse  level  problems  are  solved  in  parallel. 

This  leads  to  a  natural  mapping  of  the  processors  onto  the  coarse  level  problems  so  that  all  pro¬ 
cessors  are  always  in  use.  In  addition,  the  new  parallel  algorithms  are  more  efficient  than  the 
standard  multilevel  algorithms. 

VV'e  have  in  no  way  answered  all  of  the  relevant  questions.  We  are  currently  studying 
much  more  general  problems  and  how  to  pick  the  restriction  and  prolongation  matrices  which 
minimize  the  convergence  rate.  One  danger  in  aiming  for  one  iteration  convergence  is  that 
these  methods  may  no  longer  be  cost  effective.  There  are  two  ways  to  deal  with  this  problem: 
first,  design  restriction  and  prolongation  matrices  which  require  more  than  one  iteration  to  con¬ 
verge  to  an  acceptable  solution.  The  second  is  to  generate  a  .arge  enough  set  of  aggregate  prob¬ 
lems  so  that  the  coarse  level  solves  are  small  enough  to  be  computationally  attractive.  Our 
findings  will  be  reported  in  [II]. 
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INTRODUCTION 

Spectral  multigrid  methods  [9,13,17,18]  combine  the  accuracy  of  spectral  discretizations 
with  the  efficiency  and  flexibility  of  multigrid  solution  techniques.  To  date  they  have 
been  implemented  in  an  exclusively  two-dimensional  setting,  with  applications  to  elliptic 
model  problems  [9,17,18]  and  to  compressible,  potential  flows  [13].  In  this  paper,  spec¬ 
tral  multigrid  methods  are  extended  to  three-dimensional  periodic  problems  and  applied 
to  the  large-eddy  simulation  of  turbulent  flow.  This  work  represents  one  realization  of 
the  prospective  large-scale  applications  of  spectral  multigrid  methods  that  were  discussed 
in  [19]. 

In  section  2,  the  concept  of  large-eddy  simulation  is  introduced  and  the  discretized 
Helmholtz  equations  are  formulated.  Section  3  discusses  several  multigrid  algorithms  suit¬ 
able  for  Poisson  and  Helmholtz  equations.  Numerical  results  on  the  model  problems  are 
discussed  in  section  4.  Finally,  the  multigrid  algorithms  developed  in  the  previous  sections 
are  incorporated  into  the  full  time-dependent  turbulence  simulation.  Numerical  results  are 
given  in  section  5. 

1.  INCOMPRESSIBLE  HOMOGENEOUS  TURBULENCE 

Large-eddy  simulation  (LES)  models  the  small  spatial  scales  of  a  turbulent  flow  as  a  func¬ 
tion  of  the  large  scale  variables  [2,5,10,11,12,15].  A  spatial  filter  applied  to  the  velocities 
produces  the  large-scale  velocities  from  which  the  small  spatial  scales  have  been  removed. 
The  validity  of  LES  rests  on  the  assumption  that  the  small  scale  statistics  are  insensitive  to 
geometry  away  from  solid  boundaries.  Inasmuch  as  this  criterion  is  satisfied,  a  good  LES 
model  is  applicable  to  a  wide  variety  of  configurations.  Alternatively,  the  Reynolds  aver¬ 
aged  Navier-Stokes  equations  result  from  time  averaging  the  Navier-Stokes  equations  [14]. 
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The  resulting  perturbation  velocities,  the  difference  between  the  true  velocities  and  the 
averaged  velocities,  become  the  velocity  fluctuations  in  time  and  contain  information  at 
all  spatial  scale  lengths.  Consequently,  Reynolds  averaged  turbulent  models  are  expected 
to  be  more  geometry  dependent  than  LES  models. 

In  non-dimensional  form,  the  conventional  Navier-Stokes  equations  are  given  by 

dv 

—  +  V  •  (uu)  =  — Vp  +  V  •  uVv  (l) 

at 

V  •  v  =  0  (2) 

where  v  is  the  velocity  vector,  p  is  the  static  pressure,  and  u,  the  kinematic  viscosity,  is 
assumed  to  be  constant. 

Any  flow  variable  7  can  be  spatially  filtered  in  the  following  manner: 


7(x)  =  Jd  G(x  -  z,  A )7(z)ds 


where  G  is  a  filter  function,  A  is  the  computational  mesh  size,  and  D  is  the  domain  of 
the  fluid.  It  follows  that  Eq.  (3)  substantially  reduces  the  amplitude  of  the  high-frequency 
spatial  Fourier  components  of  any  flow  variable  7.  Consequently,  7  can  be  more  accurately 
termed  the  large-scale  part  of  7. 

The  turbulent  fields  are  decomposed  into  their  large  and  small  scale  components  based 
on  the  prescription 

7  =  7+7'  (4) 

where  7*  is  the  velocity  representative  of  the  small  spatial  scales.  The  direct  filtering  of 
the  momentum  equation  yields 


- (-  V-(vv)  =  —  Vp  +  V-uVv  +  V-r  (5) 

dt 

V-F  =  0  (6) 

where  _ 

Tu  =  -{WtVt  -  vkvt  +  v'kvt  +  v',vk  +  v^vj)  (7) 

is  the  subgrid-scale  stress  tensor.  This  tensor  can  be  decomposed  into 

Lkt  =  ~{vkvi-vkvi)  (8) 

+  (9) 

Ru  =  (io) 

which  are  respectively,  the  subgrid-scale  Leonard,  cross,  and  Reynolds  stresses  [6]. 

The  deviatoric,  i.e.  trace-free,  part  of  the  subgrid-scale  Reynolds  stress  tensor,  pR  ,  is 
approximated  by  the  Smagorinsky  model 

pRki  =  i/e  pSki  (11) 


where  t /E  is  the  velocity  dependent  eddy  viscosity 
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**(«)  =2CRA2Ilf 


*<£+£>  (,3) 

=  (H) 

(i.e.,  S  is  the  Favre  filtered  rate  of  strain  tensor  while  IIj  is  its  second  invariant)  and 

Cr  is  the  Smagorinsky  constant  (the  Einstein  summation  convention  for  repeated  indices 
is  assumed.)  The  cross  and  Leonard  subgrid-scale  stresses  are  approximated  with  the 
Bardina  model  [l].  Together  with  the  subgrid-scale  Reynold  stresses,  one  obtains  the 
linear  combination  model  [1] 

Tu  =  — z>(v*Vi  -  vkvi )  +  l>e  dSu)-  (15) 


For  purposes  of  numerical  computation,  the  subgrid-scale  stresses  are  partitioned  into 
the  subgrid-scale  Reynolds  stress  and  the  remaining  terms.  The  latter  terms  contain  no 
derivatives  of  velocity,  and  are  therefore  treated  explicitly  along  with  the  advection  term. 

Substitution  of  the  subgrid-scale  stress  (15)  into  Eq.  (1)  transforms  the  momentum 
equation  into 

+  V-(Sw)  -  VP  +  V-(i/  +  uE) W  +  V-(L  +  C)  (16) 

where  the  isotropic  components  of  the  total  subgrid-scale  stress  have  been  lumped  together 
with  the  pressure  to  produce  a  new  pressure  variable,  P  ,  defined  by 

P  =  p  +  jLkk  +  tCkk  +  iRkk.  (17) 

The  subscript  I  indicates  that  only  the  trace  of  the  tensor  is  considered. 

For  the  isotropic  turbulence  problem,  equations  (16)  and  (6)  are  solved  in  a  cubic 
computational  domain,  periodic  in  all  three  spatial  directions.  Fourier  spectral  methods 
are  an  established  approach  to  this  problem  [10,12].  The  solution  is  obtained  in  two  steps. 
In  the  first  step,  the  convective  terms  and  the  Leonard  and  cross  subgrid-scale  stresses 
are  solved  explicitly  while  the  viscous  terms  are  treated  with  an  implicit  algorithm.  In 
the  second  step,  a  Poisson  equation  is  solved  for  the  pressure  to  insure  that  the  velocity 
field  remains  divergence-free.  For  convenience,  the  overbars  are  removed  hereafter  from 
the  primitive  variables,  and  it  is  understood  that  the  variables  refer  to  spatially  averaged 
quantities.  For  a  first-order  time  discretization,  the  first  step  thus  solves 

tf  *  =  v"  —  At(w  x  v  +  V-(ijL  -I-  £>C )]”  +  AtV-(i/  +  t/£)Vtf*.  (18) 

Note  that  the  momentum  equation  is  used  in  rotation  form.  The  pressure  therefore  acquires 
the  additional  term  l/2]u|2.  As  a  result  of  the  first  step,  one  obtains  an  intermediate 
velocity  field  tf  *  which  serves  as  initial  conditions  for  the  correction  stage 

v"+1  =  tf*  -  At  VPn+1  (19) 
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V-un+1  =  0  (20) 

For  spectral  collocation  algorithms,  a  direct  solution  of  Eq.  (18)  is  not  feasible  because 
the  matrices,  which  represent  the  diffusion  operators,  are  full.  The  alternative  which  is 
discussed  here  is  to  use  iterative  methods,  and  in  particular  spectral  multigrid  (SMG) 
methods,  for  the  solution.  The  second  step  in  the  splitting  algorithm  can  be  transformed 
into  a  Poisson  equation  with  constant  coefficients  which  is  solved  exactly  in  Fourier  space. 
In  practice,  both  the  explicit  terms  are  solved  with  a  third  order  Runga-Kutta  algorithm, 
while  the  implicit  diffusion  terms  are  approximated  with  a  Crank-Nicholson  scheme.  The 
implicit  equations  are  solved  at  each  of  the  three  Runga-Kutta  stages. 


2.  SPECTRAL  REPRESENTATION 

When  the  solution  to  a  numerical  problem  is  approximated  by  a  truncated  series  of  ap¬ 
propriate  global  basis  functions,  the  solution  is  said  to  have  a  spectral  representation. 
The  method  of  projecting  the  solution  onto  the  basis  function  space  determines  the  type 
of  spectral  approximation:  Galerkin,  tau  or  collocation.  Spectral  methods  are  explained 
thoroughly  in  [4,9]  and  a  summary  of  their  applications  to  fluid  dynamics  is  provided  in  [7]. 
In  this  paper,  only  collocation  methods  are  considered,  since  they  are  better  suited  to  the 
solution  of  non-linear  and  variable  coefficient  problems.  Periodicity  further  restricts  us  to 
a  Fourier  representation  of  the  primitive  variables. 

Consider  the  three  dimensional  periodic  function  u(r)  on  the  domain  [0,  27t]3  and  its 
truncated  Fourier  representation 


NJ  2  N,/2  N./2 

=  E  E  E  W 

k,=-N,/2+l  k,=-N,/  2+1  k,=-N.f 2+1 


,i(k.x+k,y+k,z) 


The  superscripts  on  u,  henceforth  omitted,  refer  to  the  set  of  collocation  points 

nh  =  (zityk,zi)  =  (jh*,khy,/h,),  0  <  j  <  Nx,0  <  k  <  N„,0  <  l  <  Nz.  (22) 

where  h  =  (hx,hy,h,)  =  (2n /Nz,2n /Nv,  2k /N,).  The  number  of  collocation  points  in  the 
x,y,  z  directions  are  respectively  [NX,NV,NX).  To  insure  spectral  accuracy,  u(r)  must  be  a 
C°°  function.  The  function  u,  evaluated  at  the  collocation  points  m,n,p,  and  the  Fourier 
coefficients  fit,,*,,*,  are  related  through  the  pair  of  discrete  Fourier  transforms 


N,/2  Ny/2 

N./2 

E  E 

E  *k„kx,k.tilk'Xm+k’v'’+k'''). 

(23) 

-N./2+1  k,=-N,/2+l  k. 

=-N./ 2+1 

1  N ’ 

If, 

N, 

:  =  1  y 

■*  NxNvNx  to 

E 

E  Um,n,pe~i(k,Xm  +  t,V"+k’*r^ ■ 

V24) 

rt=0 

T5 

II 

O 
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First  and  second  derivatives  of  u  are  simply  obtained  by  differentiating  Eq.  (21)  term 
by  term  and  evaluating  the  result  at  the  collocation  points.  For  example,  the  first  and 
second  derivatives  in  the  x-direction  are 


du 

m,n,p 

w,/j 

=  E 

N,/i 

E 

N./i 

E  il<ziik„k,lk,ei{k‘*m+k,'"L+k,M’) 

(25) 

k,=-N./2+l 

k,=-N,/t+l 

k.=-ff./2+l 

d*U 

dx 2 

m.n.p 

N./7 

N,/2 

N./2 

E 

E 

(26) 

k.=~N,/2+l 

*,  =  -W,/2+l 

k.=-N./ 2+1 

Derivatives  in  the  y  and  z  directions  have  similar  expressions.  When  evaluating  first 
derivatives,  the  highest  mode  in  the  direction  the  derivative  (the  N/2  mode)  is  removed 
since  it  makes  a  purely  imaginary  contribution  to  the  first  derivative  at  the  collocation 
points. 

In  many  problems,  one  is  required  to  solve  large  systems  of  equations  on  fine  grids. 
However,  direct  methods  are  often  impractical  because  of  the  size  of  the  problem,  and 
standard  iterative  methods  have  very  slow  convergence  rates.  Typically,  the  high  frequency 
components  of  the  error  damp  out  quickly,  while  there  is  a  very  slow  decay  of  the  error 
on  the  larger  scale  lengths.  Such  relaxation  schemes  smooth  out  the  error  very  quickly. 
Multigrid  methods  accelerate  the  convergence  of  iterative  methods  by  recognizing  that  low 
frequency  errors  on  a  fine  grid  become  high  frequency  errors  on  a  coarse  grid.  Therefore, 
the  smoothed  residual  is  interpolated  onto  a  coarser  grid,  and  a  new  set  of  equations,  similar 
to  the  original  set,  is  solved.  The  coarsening  process  is  continued  until  a  sufficiently  coarse 
grid  is  reached  on  which  a  direct  solution  procedure  is  relatively  inexpensive.  From  the 
coarsest  grid  solution,  a  solution  on  the  next  finer  grid  is  obtained  by  prolongation  of  the 
coarse  grid  correction  onto  the  next  finer  grid,  optionally  followed  by  several  relaxation 
sweeps  to  eliminate  the  high  frequency  errors  introduced  by  the  interpolation  process.  In 
general,  therefore,  multigrid  algorithms  have  three  components:  a  restriction  operator  to 
transfer  residual  information  from  the  finer  to  coarser  grids,  a  prolongation  operator  to 
extend  a  coarse  grid  correction  to  the  next  finer  level,  and  a  smoothing  algorithm  whose 
objective  is  to  reduce  the  high  frequency  components  on  a  given  level.  There  exists  an 
extensive  literature  on  multigrid  algorithms.  Several  good  review  papers  appear  in  [8]. 

Spectral  multigrid  distinguishes  itself  from  other  types  of  multigrid  approaches  in  the 
choice  of  the  interpolation  and  prolongation  operators.  In  the  problem  considered  here,  all 
functions  are  periodic.  This  leads  to  the  preferred  truncated  Fourier  series  representation. 
Following  [17],  interpolation  of  a  variable  from  a  fine  to  coarse  grid  consists  of  the  following 
steps.  Transform  the  variable  to  Fourier  space,  reject  the  highest  modes  not  resolvable  on 
the  coarse  grid,  and  transform  back  to  physical  space  on  the  coarse  grid.  Prolongation 
is  done  in  a  similarly  straightforward  manner.  After  transforming  the  variable  to  Fourier 
space,  additional  terms  are  added  to  the  Fourier  series  with  zero  coefficients.  The  newly 
defined  function  is  then  transformed  back  to  physical  space  on  the  fine  grid.  Contrary 
to  the  more  popular  interpolation  methods  used  in  the  finite-difference  context,  which 
always  introduce  high  frequency  components  into  the  solution,  the  spectral  interpolation 
just  described  is  exact  for  solutions  to  the  constant  coefficient  Helmholtz  equation. 
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Fast  Fourier  transform  (FFT)  methods  permit  the  basic  interpolation  calculations  to 
be  performed  in  0{N*  log  N)  floating  point  operations,  when  the  number  of  nodes  in 
all  three  directions  is  equal  Nx  =  Ny  =  Nt  =  N.  The  grid  transfer  operators  and  the 
residual  calculation  are  both  based  on  FFT’s,  the  former  to  interpolate  variables  between 
different  grids,  and  the  latter  to  perform  first  and  second  derivative  evaluations  at  the 
grid  points.  Therefore  the  overall  multigrid  scheme  has  an  operation  count  proportional 
to  0{NS  log  JV). 


3.  RELAXATION  SCHEME 


The  use  of  FFT’s  to  calculate  the  residual  restricts  the  choice  of  relaxation  schemes  to  si¬ 
multaneous  relaxation  schemes  such  as  Jacobi  and  Richardson.  These  relaxation  schemes 
are  implemented  in  physical  space.  Consider  the  constant  coefficient  scalar  Poisson  equa¬ 
tion 

V2u  =  /(f).  (27) 

The  Richardson  scheme  is  one  of  the  simplest  smoothers.  Applied  to  Eq.  (27),  the  solution 
after  one  smoothing  step  becomes 

u  <—  u  —  ur,  (28) 

where  r  is  the  residual  f-V2u  and  u  is  the  relaxation  parameter.  Both  stationary  (fixed  w) 
and  non-stationary  (variable  ui)  are  considered.  Equation  (28)  admits  a  Fourier  analysis.  If 
a  single  three-dimensional  Fourier  mode  ( j,k,l )  is  substituted  into  Eq.  (27),  the  smoothing 
rate,  n,  becomes 

M(*)  =  |1 -«(**  + **  +  *J)|  (29) 

where  w  is  the  relaxation  parameter.  In  the  context  of  multigrid  methods,  the  objective 
is  to  minimize  the  smoothing  rate  of  the  high  frequencies  seen  by  the  fine  grid,  and  not 
resolvable  on  the  coarser  ones.  Given  an  existing  grid,  the  next  level  of  coarsening  is 
obtained  by  defining  the  set  of  collocation  points  fij*.  The  range  of  wavenumbers  over 
which  the  minimization  is  performed  is  the  difference  between  the  two  cubes  (in  wave 
number  space)  [0,  y]3  and  [0,  y]3.  Strictly  speaking,  because  both  the  mean  mode  and  the 
N/2  mode  have  been  filtered  out  of  the  right  hand  side,  /(f),  the  wave  numbers  considered 
for  the  minimization  should  actually  be  in  the  region  defined  by  the  difference  between  the 
cubes  [1,  y]3  and  [1,  ^]3. 

A  straightforward  calculation  leads  to  an  optimal  smoothing  rate  of 


V  = 


(30) 


for  the  Richardson  iteration  scheme,  where  Nj  is  the  number  of  spatial  dimensions.  When 
Nd  =  3,  JC  =  11/13  «  .85.  It  is  obvious  from  Eq.  (30)  that  the  asymptotic  smoothing  rate 
increases  with  increasing  spatial  dimension.  For  example,  as  the  number  of  dimensions 
increases  from  1  to  3, 77  increases  from  0.6  to  0.85.  For  3-D  problems,  the  optimal  relaxation 
parameter  is 
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Operators  that  are  spectrally  discretized  have  a  wider  spread  of  eigenvalues  than  their 
finite-difference  counterparts.  For  example,  the  optimal  Richardson  smoothing  rate  for  a 
second-order,  central  difference  discretization  of  the  constant  coefficient  Poisson  equation 


Nd-k 


t*FD  ~ 


""  Nd+l 

In  contrast  to  the  spectral  smoothing  rates,  JiFD  ranges  from  0.33  to  0.71  for  1,  2  and  3-D 
problems. 

Brandt,  Fulton  and  Taylor  [3]  applied  the  residual  averaging  technique  to  accelerate 
the  convergence  of  Richardson’s  smoothing  algorithm  for  two-dimensional  Fourier  repre¬ 
sentations.  The  extension  to  the  present  three-dimensional  problem  is  straightforward. 
The  smoothing  algorithm  now  satisfies 

4x2 

umnp  “  Um„p  —  ~J^A.Tmnp  (^3) 

where  (m,  n,  p)  is  the  grid  point  to  which  the  smoother  is  applied  and  A  is  the  averaging 
template.  A  is  the  three-dimensional  array 


(  6  i  6 

A=  10  1 

\  6  i  6 


i  0  i 
0*0 
1  0  1 


6  i  6 
10  1 
6  i  S 


If  elements  of  A  are  denoted  by  A,,-*,  the  three  arrays  in  the  above  expression  correspond 
(from  left  to  right)  to  k  =  1,2,3.  The  parameter  6  is  associated  with  the  8  comer  points 
of  the  cube  centered  at  (m,  n,  p)  about  which  the  averaging  is  being  performed.  In  con¬ 
ventional  notation,  the  averaging  template  A  applied  to  the  residual  at  [m,n,p)  produces 
the  expression 

■At'rn.n.p  =  [®  "b  0  5-/  S  ]  r’n+i,n+j,p+k‘  (35) 

|i|+W+W=i  H+|,-|+|*|=J  Ml+UI+m-s 

Such  a  scheme  is  called  weighted  residual  averaging  (WRA).  A  Fourier  analysis  applied  to 
Eq.  (33)  yields 

(i(kt,ky,kz\n, 0,1,6)  =  |1  -  ^-(fc*  +  k\  +  k3g)[a  +  2/?(cos0x  +  cos0„  +  cos0,) 

+4Tr(cos  0V  cos  6,  +  cos  6,  cos  0*  +  cos  0*  cos  0„) 
-t-80(cos0*cos0„cos0»)]|.  (36) 


where  (0r,0„,0t)  is  a  shorthand  notation  for  77 (fc*,  kv,  k,).  The  solution  to  the  minimax 
problem 

Jt=  mn  max  n{6t,9v,0t]  a,  0,i,6)  (37) 

yields  the  optimum  parameters  a,0,i,  and  6  as  well  as  the  smoothing  rate  JI.  This  is  solved 
numerically.  The  angles  lie  in  the  region  formed  by  the  difference  between  the  two  cubes 
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Table  1:  Optimal  averaging  parameters  and  smoothing  rates  for  weighed  residual 
averaging  scheme. 

2-D  JI 
0.777 
0.472 
0.106 


a 

a 

7 

6 

3-D  Ji 

0.062 

0.000 

0.0 

0.0 

0.852 

0.101 

0.145 

0.0 

0.0 

0.608 

0.120 

0.287 

0.0083 

0.0 

0.453 

0.144 

0.042 

0.0185 

0.0085 

0.195 

[0,  7t/2]s  and  [0,  x/4]8.  To  demonstrate  the  importance  of  all  four  averaging  coefficients,  Ji 
was  calculated  by  successively  increasing  the  number  of  non-zero  coefficients.  The  results 
are  presented  in  table  1  where  a  comparison  is  made  with  the  2-D  results  of  Brandt,  Fulton 
and  Taylor  [3j.  In  both  2-D  and  3-D,  Ji  decreases  substantially  with  the  help  of  residual 
averaging.  As  expected,  the  minimum  3-D  spectral  radius  is  higher  than  the  optimal  2-D 
value. 

3.1  Variable  Coefficient  Poisson  Equation 

The  analysis  of  the  previous  section  is  exact  for  the  constant  coefficient  Poisson  equation. 
More  generally,  one  wishes  to  solve 


V-a(r)Vu  =  /(f)  (38) 

where  a(f)  is  a  strictly  positive  C°°  function.  Although  convergence  is  no  longer  theoret¬ 
ically  guaranteed,  good  results  are  obtained  if  the  residual  is  first  divided  by  a(f)  before 
averaging.  Alternatively,  one  can  use  Eq  (33)  after  replacing  4 n2/N2  by  An*  /  a[r)N2 . 
Both  approaches  yield  similar  results.  The  results  presented  herein  are  based  on  the  for¬ 
mer  method. 

3.2  Helmholtz  Equation 

With  the  good  convergence  rates  achieved  for  the  Poisson  equation,  attention  is  now  fo¬ 
cussed  on  the  three-dimensional  Helmholtz  equation 

V-aVu  —  Au  =  /(f)  (39) 

where  a(f)  is  a  C°°  function  and  A  is  a  positive  constant.  The  iteration  scheme  and  its 
associated  convergence  rate  are  respectively 

u  «—  u  —  w  (/(f)  —  V-oVu  +  Au)  (40) 

and 

H=  |l-u/(fc2  +  A)|  (41) 

where,  as  previously,  the  number  of  grid  points  is  assumed  to  be  equal  in  all  directions 
(Nt  =  Nv  =  NM  =  N).  The  optimum  smoothing  rate  is 
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P  = 


11/13 

32  A 

1  H - 

13  N2  a 


(42) 


which  is  exact  for  constant  coefficient  a.  (It  is  assumed  that  the  kz  =  ky  =  kM  =  0  mode  is 
solved  for  exactly  on  the  coarsest  grid.)  Note  that  the  difference  in  smoothing  rates  of  the 
Poisson  and  Helmholtz  operators  decreases  with  increasing  grid  size. 


Experiments  were  also  performed  with  a  non-stationary  Richardson  smoothing  algo¬ 
rithm.  If  k  is  the  condition  number 

_  ^mqi 

K  ~  x~ 

mm 

of  the  discrete  spectral  operator  V-aV  —  A,  the  jtk  relaxation  parameter,  u>*  of  a  k- 
parameter  cycle  is 

_ _ 2/A mjn _ 

*  -  l)cos(  ^-2---I--)  +  (ft  + 

and  the  corresponding  smoothing  rate  is 


(45) 


which  is  the  solution  to  a  standard  minimax  problem  [16J.  The  range  of  frequencies  that 
are  preferentially  damped  are  in  the  domain  defined  by  the  difference  between  the  two 
cubes  (0,  Amax]3  and  [0,Amin]3.  As  a  function  of  A  and  of  the  coefficient  a(r),  the  minimum 
and  maximum  eigenvalues  of  the  discrete  Helmholtz  operator  are 


.  _  aN* 

Amin  —  jg  +  A, 


^  max  — 


3a  N* 


+  A 


(46) 


and  the  sequence  of  optimal  relaxation  parameters  in  the  non-stationary  Richardson  iter¬ 
ation  scheme  become 


32 

aN2  +  16A 


_ 1 _ 

(ft  -  l)cos(^2^ 5 )  +  (*  +  !) 


(47) 


The  number  of  terms  in  the  sequence  is  set  equal  to  the  number  of  smoothing  sweeps.  For 
stationary  Richardson,  j  =  1  and  Eqs.  (47)  reduces  to 


w  = 


32 

13N* 

32  A 

+  137V2  a 


(48) 


while  JI  is  given  by  Eq.  (42)  If  the  value  of  Am,„  and  Anutt  (Eq.  (46)  are  inserted  into  the 
condition  number  defined  by  Eq.  (43),  it  is  clear  that  the  convergence  rate  must  increase 
when  either  TV,  or  A  is  increased. 


Table  2  confirms  that  ~p  decreases  with  increasing  A  and  j.  In  practice,  a  3-cycle  scheme 
is  sufficient  to  reduce  the  Lt  norm  of  the  residual  by  a  factor  of  5.  An  alternate  formulation 
of  the  Richardson  method  is  obtained  when  the  Helmholtz  term  is  treated  implicitly,  i.e 
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u  —  wj»(/(f)  —  V*  oVu) 

(49) 

1  +  Lup 

or  equivalently 

u  <—  u  +  ua(f(r)  -  V-aVu) 

(50) 

where  ug  is  the  implicit  Helmholtz  relaxation  parameter 

t Op 

<*>H  =  ,  .  . 

1  4"  A  Up 

(51) 

If  up  is  the  optimum  relaxation  parameter  for  the  constant  coefficient  Poisson  operator, 
given  by 

32 

Wp  =  13NV  (52) 

ug  is  identical  to  the  value  obtained  in  Eq.  (48).  Therefore  the  two  algorithms  are  identical 
for  all  a  >  0.  However  they  differ  from  one  another  for  non-stationary  Richardson.  Indeed, 
in  the  implicit  formulation,  one  chooses  the  u*  to  optimize  the  convergence  of  the  Poisson 
operator  which  leads  to  relaxation  parameters  independent  of  A.  The  A  dependence  is 
introduced  as  an  extra  positive  term  in  the  denominator  of  Eq.  (49).  This  is  in  contrast 
to  u;*  given  by  Eq.  (47)  where  A  appears  in  the  coefficient  of  the  cosine  function.  Table  3 
illustrates  the  differences  between  the  two  approaches  when  non-stationary  Richardson  is 
used  with  cycles  of  varying  length.  The  two  methods  give  approximately  identical  smooth¬ 
ing  rates,  except  in  the  limit  of  large  A  where  the  implicit  method  gives  slightly  better 
performance.  From  a  practical  point  of  view,  it  is  cheaper  to  evaluate  the  acceleration 
parameters  for  the  implicit  scheme  because  a  factor  1  fa  can  be  factored  out  of  w*  and 
combined  with  the  residual.  This  is  not  possible  for  the  explicit  formulas  which  depend  on 
r. 


Table  2:  Smoothing  rates  for  non-stationary  Richardson  iteration  applied  to  the 
Helmholtz  equation. 


Jfc 

o 

li 

la 

JI(A  =  10) 

A  =  50) 

1 

0.846 

0.826 

0.755 

2 

0.747 

0.720 

0.631 

3 

0.689 

0.661 

0.573 

4 

0.655 

0.628 

0.542 

5 

0.634 

0.607 

0.524 

6 

0.619 

0.593 

0.512 

Table  3:  Comparison  of  convergence  rates  of  explicit  versus  implicit  non-stationary 
Richardson  iteration  algorithms. 


_ 

32s 

Grid  Size 
64s 

128s 

m 

0.652/0.651 

0.628/0.618 

0.542/0.511 

0.654/0.654 

0.648/0.645 

0.621/0.610 

0.655/0.655 

0.653/0.653 

0.646/0.643 
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4.  IMPLEMENTATION 


The  stationary  and  non-stationary  relaxation  schemes  were  implemented  in  a  simple  V- 
cycle  formulation  which  is  described  in  detail  in  [17,18]  for  the  two-dimensional  Poisson 
equation.  Results  are  based  on  a  fine  grid  of  32s  and  3  coarser  grids  of  16s,  8s  and  4s.  A 
constant  number  of  relaxation  sweeps  on  each  grid  on  the  upwards  ( Nu )  and  downward 
(Nd)  branches  of  the  V-cycle  was  found  to  provide  good  overall  smoothing  rates  for  all 
the  cases  that  were  considered.  An  effective  rule  of  thumb  is  to  decrease  the  residual  by 
an  order  of  magnitude  on  each  grid.  This  leads  to  Nj  ~  2  for  WRA,  because  of  the  high 
smoothing  rates,  and  Nd  =  3  for  the  non-stationary  Richardson  (NSR)  schemes.  This 
corresponds  to  an  approximate  decrease  of  the  Lj  norm  of  the  residual  on  the  order  of  0.2 
and  0.05  for  the  WRA  and  NSR  respectively  (per  Nd  fine  grid  relaxations).  In  all  tests, 
Nu  =  1.  This  is  because  the  variable  coefficient  a  introduces  high  frequencies  into  the 
residual  after  a  prolongation. 

All  the  computations  presented  were  performed  on  the  Cray  2.  Fast  Fourier  transforms 
are  coded  in  Fortran,  and  achieve  a  100  Mfiop  rate  on  grids  on  64s  and  above.  Timings 
may  fluctuate  by  10%  —  20%  for  identical  runs  due  to  system  load. 


5.  RESULTS 


There  are  many  different  approaches  to  measuring  the  efficiency  of  a  multigrid  algorithm. 
Ultimately,  the  user  is  interested  in  the  total  CPU  clock  time  a  code  takes  to  complete 
execution.  However,  this  timing  is  strongly  computer  dependent,  and  on  a  given  system, 
the  programmers  skill  can  greatly  influence  the  results.  It  is  therefore  necessary  to  supple¬ 
ment  the  CPU  time  with  more  intrinsic  measures.  The  simplest  measure  is  the  smoothing 
rate  JZ  of  the  smoother  on  the  finest  grid.  Unfortunately,  Ji  does  not  take  into  account  the 
work  done  on  the  coarser  grids,  nor  does  it  account  for  the  time  spent  performing  grid 
transfers.  Alternate  criteria  are  required  to  take  this  work  into  account.  One  method  is 
to  calculate  the  ratio  of  Lj  norms  of  the  residual  after  and  before  a  single  V-cycle,  and 
calculate  a  smoothing  rate  per  V-cycle  as 


where  Nf  is  the  total  number  of  fine  grid  sweeps  during  one  V-cycle  ( Nu  +  Nd).  Although 
fly  is  still  not  useful  as  an  accurate  measure  of  efficiency  since  it  does  not  take  residual 
transfers  into  account,  it  can  help  establish  whether  the  high  frequencies  seen  by  each  grid 
are  damped  at  the  same  rate.  If  J£  and  Jtv  are  unequal  and  the  number  of  relaxations  is  the 
same  on  every  grid  (except  perhaps  the  coarsest),  the  frequency  content  of  the  error  vector 
is  unevenly  distributed,  and  the  smoothing  rates  will  be  different  on  each  grid.  Therefore, 
JIV  is  useful  as  a  diagnostic  tool. 

A  better  measure  of  the  overall  algorithm  efficiency  is  obtained  from  the  number  of 
equivalent  fine  relaxation  sweeps  defined  as 
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CPU  time  per  V  -  cycle 

7y  —  _  — 

tq  CPU  time  per  fine  grid  relaxation 
The  equivalent  convergence  rate 


measures  the  decrease  in  the  residual  norm  per  fine  sweep  taking  the  total  multigrid  over¬ 
head  into  account.  Together  with  the  total  CPU  time  per  V-cycle,  the  performance  of  the 
algorithm  can  be  ascertained.  In  what  follows,  performance  is  measured  exclusively  by  p 
and  pT.  Processing  time  is  a  function  of  the  number  of  relaxation  sweeps  on  the  way  up 
and  down  the  V-cycle,  and  on  the  grid  size.  These  parameters  are  kept  constant  within  a 
given  table  except  in  table  6  where  the  effect  of  fine-grid  resolution  is  studied.  Therefore, 
the  decrease  in  the  residual  norm  after  a  fixed  number  of  multigrid  cycles  is  an  objective 
measure  of  the  algorithm’s  efficiency. 

As  explained  in  detail  in  [18],  the  N/2  Fourier  mode  must  be  filtered  out  of  the  residual 
every  time  it  is  computed.  In  one  dimension,  this  is  done  in  physical  space  by  projecting 
the  residual  function  onto  the  space  orthogonal  to  (— l)J,j  =  1  ,NX.  In  higher  dimensions, 
a  sequence  of  1-D  filtering  operations  is  performed.  The  filtering  operation  consumes 
approximately  25%  of  the  residual  calculation  on  a  32s  grid.  The  influence  of  vectorization 
is  clearly  seen  from  the  decrease  in  relative  time  spent  filtering  as  the  grid  size  increases. 
For  example,  as  N  increases  from  32  to  128,  the  percentage  of  time  spent  filtering,  measured 
with  respect  to  the  total  time  spent  calculating  the  residuals  (which  includes  the  filtering), 
decreases  from  25%  to  13%.  Equivalently,  the  percentage  of  time  spent  in  3-D  derivative 
evaluations  during  the  computation  of  the  residual  increases  from  67%  to  77%  as  the  grid 
size  increases  from  32s  to  128s. 

When  solving  the  Poisson  equation,  the  mean  value  of  the  right-hand  side  of  the  equa¬ 
tion  must  be  filtered  out  to  insure  convergence  of  the  residual  towards  zero  [18].  This  is 
done  once  at  the  beginning  of  the  calculation. 

All  the  numerical  experiments  were  done  with  the  Helmholtz  equation  (with  A  set  to 
a  constant  value).  The  Poisson  equation  is  obtained  by  setting  A  to  zero.  The  coefficient 
a(r)  is  set  to 

a(r)  =  1  +  £Cco.(«)+co.(»)+co.(*)>  (56) 

In  all  cases  considered,  the  right  hand  side  of  the  Helmholtz  equation  is  calculated  to  insure 
that  the  exact  solution  is 

u„(r)  =  sin(N,irsin(x))  sin(IV„7r  sin(y))  sin(AT,x  sin(z))  (57) 

The  factors  Nx,  Nv  and  N,  are  included  to  insure  that  the  complete  spectrum  of  spatial 
scales  are  equally  represented  in  the  error  vector.  This  insures  that  p  =  pv.  Such  a 
solution  however,  precludes  a  direct  comparison  of  the  computed  and  the  exact  solution 
because  uex  is  no  longer  well  represented  by  the  collection  of  Fourier  modes. 

Convergence  results  for  the  WRA  are  presented  in  table  4.  The  smoothing  rates  ob¬ 
tained  for  the  constant  coefficient  Poisson  equation  are  lower  than  the  theoretical  predic- 
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Table  4:  Rates  of  convergence  for  Poisson  equation 


0.0  0.5  1.0 

Vt 

r? 

a 

0.17  0.18  0.19 

0.61  0.58  0.61 

4(-ll)  3(-10)  1(-10) 

tions.  This  is  mainly  because  the  analytic  results  presented  in  Table  1  are  only  valid  in 
the  limit  N  — *  oo.  Although  a{r)  varies  by  a  factor  20  across  the  physical  domain  when 
e  =  1,  JZ  remains  close  to  the  optimal  value  of  0.2.  Experiments  have  shown  that  a  large 
degradation  of  JZ  occurs  for  e  greater  than  1.5.  Seven  V-cycles  reduced  the  residual  10  to 
11  orders  of  magnitude.  Timings  indicate  that  the  number  of  equivalent  fine  relaxations  is 
3.45  on  a  32s  grid.  This  is  larger  than  is  expected  from  the  sum  of  the  work  done  on  the 
sequence  of  grids  obtained  from  summing  the  geometric  series  1  +  (y)3  +  (|)3  +  (gj)3  +  •  •  • 
which  leads  to  1.13  equivalent  work  units.  The  discrepancy  between  the  theoretical  and 
numerical  results  are  due  to  the  time  spent  in  the  grid  transfers  which  are  not  included  in 
the  geometric  series,  and  the  inefficiency  of  the  Cray  2  program  on  the  coarser  grids. 

Large  eddy  simulations  of  turbulence  require  that  the  numerical  scheme  be  capable  of 
resolving  a  wide  range  of  spatial  scale  lengths.  Furthermore,  the  velocity  fields  have  a 
quasi-random  distribution  over  the  set  of  scale-lengths  that  survive  the  the  filtering.  In 
typical  large  eddy  simulations,  the  smallest  fluctuations  seen  by  the  LES  code  are  on  the 
order  of  4  fine  grid  cell  widths  (albeit  with  small  amplitudes).  It  is  therefore  important 
to  ascertain  the  influence  of  spatial  structure  in  a[f)  on  the  smoothing  rate  of  the  Poisson 
and  Helmholtz  operators.  To  this  effect,  the  definition  of  the  coefficient  a[r)  is  slightly 
modified  according  to 

a(r)  =  1  +  eeco»("<*)+«*("<*)+c°,("**).  (58) 

With  c  =  0.5,  a  varies  from  1  to  about  10.  The  influence  of  nt  is  to  introduce  high  frequency 
content  into  the  coefficient  without  affecting  its  range.  In  other  words,  the  higher  n£,  the 
smaller  the  distance  over  which  a  assumes  its  maximum  variation.  Table  5  shows  that 
small  values  of  nt  do  not  adversely  affect  the  convergence  rate  of  the  iteration  scheme. 


Table  5:  Effect  of  high  frequency  content  of  a (f)  on  the  smoothing  rate 


nt 

V 

tv 

1 

0.17 

0.17 

2 

0.18 

0.18 

4 

0.55 

0.55 

8 

0.58 

0.62 

*r  '  ‘ 


ft#  - 
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Table  6:  Rates  of  convergence  for  Helmholtz  equation 


A 

i 

10 

100 

1000 

ft 

0.68/0.68 

0.66/0.67 

0.49/0.62 

0.16/0.42 

Th 

0.87/0.88 

0.86/0.86 

0.77/0.83 

0.63/0.76 

[Ml 

m 

3(-5)/l(-4) 

l(-5)/2(-5) 

5(-9)/2(-6) 

l(-ll)/2(-9) 

However,  the  rate  of  convergence  rate  quickly  deteriorates  for  n(  >  4.  Note  that  when  the 
variation  of  a  becomes  too  rapid,  there  is  a  discrepancy  between  JI  and  JZV.  This  probably 
indicates  that  there  is  a  frequency  imbalance  across  the  various  grid  levels,  due  to  the  grid 
transfer  operators  which  do  not  properly  interpolate  o(r)  onto  the  coarser  grids.  Similar 
experiments  performed  on  the  Helmholtz  equation  indicate  that  A  has  a  beneficial  effect 
on  the  smoothing  rate  as  n(  is  increased.  Of  course,  JI  is  higher  than  the  worst  Poisson 
result  since  residual  averaging  is  not  allowed. 

The  Helmholtz  equation  is  numerically  solved  for  several  values  of  e  and  A.  Non¬ 
stationary  Richardson  with  a  cycle  of  3  produces  the  results  displayed  in  table  6.  Pairs 
of  numbers  correspond  to  (c  =  0/c  =  0.5).  As  expected,  for  a  fixed  e,  JI  decreases  with 
increasing  A  due  to  the  reduction  of  the  condition  number 

(59) 


=  haN'  +  \ 

7 aN 1  +  A 
4 

For  the  larger  values  of  A,  the  effect  of  non-constant  coefficient  o  is  more  severe.  Whereas 
JI  is  almost  unaffected  by  e  for  A  <  40,  the  differences  in  smoothing  rates  are  substantial 
for  A  >  100.  For  instance,  when  A  —  1000,  JI  is  approximately  3  times  larger  for  e  —  0.5 
than  for  e  =  0.0.  Similar  trends  occur  for  JIT.  However,  the  influence  of  e  on  JIT  is  less 
dramatic  at  high  A. 

Optimal  multigrid  methods  produce  convergence  rates  independent  of  the  grid  size. 
This  is  tested  for  the  Helmholtz  equation  at  e  =  0.5  and  A  =  10.  Table  7  indicates  that 
both  m  and  JIT  are  approximately  constant  over  fine  grid  sizes  that  range  between  32s  and 
128s.  In  all  cases,  the  coarsest  grid  level  is  4s.  Computer  timings  for  the  calculations  are 
also  presented.  They  are  normalized  to  1  on  the  64s  grid.  In  all  cases,  the  code  is  stopped 
after  a  fixed  number  of  V-cycles.  Increased  time  spent  on  the  128s  grid  relative  to  the  643 
agrees  quite  well  with  theoretical  predictions.  This  is  in  contrast  to  the  extra  70%  CPU 
time  spent  on  the  32s  grid  than  allowed  for  by  the  0(N  log  N)  scaling.  Poor  vectorization 
on  this  grid  is  the  probable  cause  of  this  discrepancy. 


(60) 
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Table  7:  Grid  independence  for  Helmholtz  equation,  e  =  0.5,  A  =  10 


grid  size 

64s 

128* 

V 

0.67 

0.67 

0.68 

HT 

0.86 

0.85 

0.85 

W 

1.9(-5) 

1.5(-4) 

2.5(-5) 

CPU  time  (arbitrary  units) 

0.18 

1.0 

9.0 

0(N 3  log  N)  (arbitrary  units) 

0.10 

1.0 

9.3 

6.  LARGE  EDDY  SIMULATION 


The  implicit  stage  of  the  LES  equations  requires  the  solution  to  the  set  of  three  scalar 
Helmholtz  equations  given  by  equation  (18).  Defining 


v  +  ^(tJ) 

CL  —  _  .  *  ~ 

<  V  +  Ue{v)  > 

(61) 

A  2 

<  u  4-  i/e( v)  >  At’ 

(62) 

Equ.  (18)  reduces  to  the  three  scalar  Helmholtz  equations 

V  •  a[v)Vv[r)  -  Aw  =  tf , 

(63) 

where  if  is  the  flow  velocity  after  the  explicit  step.  The  above  definitions  of  a(r)  and 
A  insure  that  <  a  >=  1,  which  allows  the  numerical  results  to  be  compared  against  the 
results  in  the  previous  sections. 


A  major  difficulty  expected  to  reduce  the  efficiency  of  the  multigrid  implementation, 
when  compared  with  the  theoretical  and  model  problem  results,  is  that  a{r)  is  now,  in 
effect,  a  random  function  of  the  spatial  coordinates.  Numerical  experiments  indicate  that 
the  multigrid  algorithm  fails  to  converge  for  A  below  a  certain  threshold.  In  the  current 
code,  this  threshold  is  A  «  100.  Convergence  rates  and  timings  are  shown  in  table  8. 
Calculations  were  performed  on  a  32s  grid. 

Although  equation  (63)  has  the  same  functional  form  as  the  model  equation,  the  effect 
of  a(r)  and  A  on  the  overall  multigrid  efficiency  relative  to  a  purely  explicit  scheme  is  not 
easy  to  determine.  One  reason  is  the  strong  dependence  of  these  parameters  on  the  filter 
width  A,  kinematic  viscosity  i/,  and  Smagorinsky  constant  CR.  To  understand  better  the 
interelations  between  these  parameters  and  the  possible  gain  of  a  multigrid  strategy,  let 

*  =  (64) 

A  fade 

where  A tadv  and  At*//  are  respectively  the  maximum  time  steps  calculated  for  the  explicit 
advection  and  diffusive  terms.  The  accuracy  of  the  simulation  is  mostly  determined  by  the 
advection  terms.  Implicit  algorithms  are  consequently  most  favorable  when  the  diffusion 
time  step  is  much  smaller  than  A (i.e.  when  JZ  «  1).  Stated  differently,  a  fully 
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explicit  solver  is  cheaper  than  a  mixed  explicit/implicit  scheme  when  R  »  1.  For  Fourier 
methods,  the  maximum  advection  Courant  number  of  the  third  order  Runga-Kutta  method 
is  0.6,  which  leads  to 

Ax  ,  . 

A  tadv  =  0.6 - .  (65) 

v 

where  v  is  a  representative  fluid  velocity.  (The  estimates  in  this  section  are  for  a  one¬ 
dimensional  problem.)  On  the  other  hand,  the  diffusion  time  limit  is 

A  i2 

Atdi/f  =  0.25-—-.  (66) 

v  +  l>e 


In  this  discussion,  all  the  variables  are  assumed  to  be  constant.  Equations  (65)  and  (66) 
combine  into 

Z  = _ _  (671 

f(v)CRni  Ax*  +  v  l67J 

Several  substitutions  have  been  made  to  arrive  at  Eq.  (67).  The  filter  width  A  has  been 
replaced  by  nAAx  to  separate  the  computational  grid  size  from  the  actual  filter  width. 
Thus,  when  nA  is  increased,  the  filtering  is  stronger,  and  more  high  frequencies  are  removed 
from  the  large-eddy  velocities.  The  velocity  dependence  of  the  Smagorinsky  model  is 
included  in  f(v)  whose  magnitude  is  a  slowly  decaying  function  of  nA. 


As  confirmed  in  table  8,  the  convergence  rate  improves  with  increasing  A.  However, 
according  to  Eq.  (62),  a  higher  A  is  the  result  of  a  decrease  in  nA,  CR  or  v.  Other  sources 
of  variation  are  not  considered  here.  Equation  (67)  therefore  clearly  indicates  that  a  higher 
A  reduces  the  gain  of  the  multigrid  scheme  over  the  purely  explicit  scheme.  This  is  borne 
out  by  table  8.  The  multigrid  code  performs  2  times  slower  than  the  explicit  scheme  when 
A  =  130  and  9  times  slower  when  A  =  1000  although  ~p  has  dropped  from  0.58  to  0.12. 
Furthermore,  as  A  increases,  so  does  R  which  explains  why  the  multigrid  code  performs  so 
poorly  (compared  to  the  explicit  scheme)  for  large  A.  The  variation  of  the  results  in  the 
table  8  is  not  uniformly  monotonic  as  a  function  of  A.  This  is  partially  because  timings 
are  a  function  of  the  load  on  the  Cray  2,  and  of  the  interaction  between  the  various 
independent  control  parameters.  For  example,  if  A  is  increased  through  a  smaller  value 


A 

impl.  code  vs.  expl.  code 
time/step  time/run 

<  100  non-converged 

130 

0.58 

16 

2 

200 

0.75 

21 

2 

240 

0.26 

12 

2 

480 

0.21 

10 

3 

540 

0.20 

12 

4 

700 

0.20 

14 

7 

1000 

0.12 

12 

9 

Table  8:  Numerical  results  from  SMG  incorporated  into  the  multigrid  code.  Figures 
refer  to  5  complete  time  steps. 
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na,  the  high  frequency  content  in  the  velocities  is  reduced  and  better  SMG  convergence 
rates  are  expected,  not  only  due  to  the  larger  Helmholtz  coefficient,  but  also  because  of 
the  smoother  velocity  fields.  On  the  other  hand,  only  the  former  cause  for  improvement 
is  remains  when  Cr  is  decreased. 

Finally,  a(r)  was  assumed  to  be  constant  in  the  above  analysis.  In  all  probability,  the 
oscillatory  nature  of  a(r)  will  also  strongly  influence  the  performance  of  the  SMG.  When 
A  drops  below  100,  the  implicit  scheme  fails  to  converge.  This  might  be  related  to  the 
highly  oscillatory  coefficients  which  appear  when  simulating  turbulent  flows. 

More  efficient  Helmholtz  solvers  must.be  developed  before  spectral  multigrid  algorithms 
will  strongly  outperform  the  explicit  schemes  for  the  simulation  of  turbulent  flows.  The 
quasi-random  coefficients  in  the  LES  equations  also  call  for  new  spectra!  interpolation 
procedures  to  improve  the  robustness  of  the  method. 


7.  CONCLUSION 


Three  dimensional  periodic  Poisson  and  Helmholtz  equations  have  been  solved  with  a 
3-D  spectral  multigrid  algorithm.  Convergence  rates  for  the  Poisson  problem  are  best 
when  weighted  residual  averaging  is  adopted.  The  spatially  dependent  coefficient  a(r)  was 
allowed  to  vary  by  more  than  an  order  of  magnitude  without  affecting  overall  convergence 
rates.  Although  weighted  residual  averaging  is  impractical  for  the  3-D  Helmholtz  equation, 
non-stationary  Richardson  is  a  viable  alternative  for  a  wide  range  of  a  and  A.  At  a  fixed 
amplitude  variation,  high  spatial  frequency  content  of  a  had  a  deleterious  effect  on  JI  for 
the  Poisson  equation.  This  is  unfortunate  since  for  turbulent  simulations  the  variables  are 
necessarily  oscillatory. 

The  algorithms  herein  were  successfully  incorporated  into  a  full  3-D,  non-stationary 
incompressible  LES  code.  It  was  found  that  in  the  range  of  parameters  examined,  the 
SMG  takes  at  least  twice  as  long  as  an  explicit  calculation.  This  is  part  due  to  the 
relatively  large  spread  of  eigenvalues  for  a  Fourier  collocation  algorithm.  Another  cause  is 
most  probably  related  to  the  strong  and  rapid  variations  of  a(r). 
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INTRODUCTION 

We  describe  a  class  of  multiscale  algorithms  for  the  solution  of  large  sparse  linear  sys¬ 
tems  that  are  particularly  well  adapted  to  massively  parallel  supercomputers.  While  standard 
multigrid  algorithms  are  unable  to  effectively  use  all  processors  when  computing  on  coarse 
grids,  the  new  algorithms  utilize  the  same  number  of  processors  at  all  times.  The  basic  idea  is 
to  solve  many  coarse  scale  problems  simultaneously,  combining  the  results  in  an  optimal  way 
to  provide  an  improved  fine  scale  solution.  As  a  result,  convergence  rates  are  much  faster  than 
for  standard  multigrid  methods  -  we  have  obtained  V-cycle  convergence  rates  as  good  as  .0046 
with  one  smoothing  application  per  cycle,  and  .0013  with  two  smoothings.  On  massively 
parallel  machines  the  improved  convergence  rate  is  attained  at  no  extra  computational  cost 
since  processors  that  would  otherwise  be  sitting  idle  are  utilized  to  provide  the  better  conver¬ 
gence.  On  serial  machines  the  algorithm  is  slower  because  of  the  extra  time  spent  on  multiple 
coarse  scales,  though  in  certain  cases  the  improved  convergence  rate  may  justify  this  - 
particularly  in  cases  where  other  methods  do  not  converge. 

In  constant  coefficient  situations  the  algorithm  is  easily  analyzed  theoretically  using 
Fourier  methods  on  a  single  grid.  The  fact  that  only  one  grid  is  involved  substantially 
simplifies  convergence  proofs.  A  feature  of  the  algorithms  is  the  use  of  a  matched  pair  of 
operators:  an  approximate  inverse  for  smoothing  and  a  super-interpolation  operator  to  move  the 
correction  from  coarse  to  fine  scales,  chosen  to  optimize  the  rate  of  convergence. 

1.  Research  supported  by  DOE  contract  DE-ACO2-76ER03077  and  by  NSF  grant  DMS-8619856. 


195 


196 


Parallel  Superconvergent  Multigrid 


1.  OVERVIEW 

In  many  situations  the  most  efficient  algorithms  for  the  numerical  solution  of  large  sparse 
elliptic  problems  are  the  various  multigrid  algorithms!  1-3],  Usually  these  methods  are  able  to 
compute  a  solution  with  N  unknowns  in  0(N)  operations,  asymptotically  faster  than  other 
algorithms.  Recently  several  efficient  parallel  implementations  of  multigrid  algorithms  have 
been  reported  on  both  SIMD  and  MIMD  parallel  computers[4-13].  We  are  interested  here  in 
the  situation  where  the  number  of  processors  is  so  large  that  it  makes  sense  to  regard  it  as 
0(N).  Extrapolating  from  the  serial  case,  one  might  then  expect  that  it  could  be  possible  to 
solve  a  system  of  N  equations  in  time  0(1)  independent  of  N .  However  it  is  well-known  that 
one  cannot  in  general  solve  a  linear  system  of  N  unknowns  in  time  less  than  O  ( log  ( N )),  no 
matter  how  many  processors  are  available.  Multigrid  methods  generally  allow  this  theoretical 
limit  to  be  achieved. 

An  efficient  multigrid  implementation  by  the  second  author  on  the  Connection 
Machine!  13],  an  SIMD  computer  with  65,536  processors,  required  O  (log(N))  parallel  opera¬ 
tions.  However  this  algorithm  was  clearly  not  optimal.  When  computing  on  coarse  grids,  most 
of  the  65,536  processors  were  in  fact  inactive.  In  the  extreme  case  of  the  coarsest  grid,  only  a 
single  processor  is  actually  doing  anything  useful.  As  a  result  the  observed  computational  time 
is  substantially  longer  than  what  one  might  have  expected  from  the  equivalent  serial  algorithm. 

The  present  paper  takes  a  step  towards  solving  this  problem.  The  new  algorithm  still 
requires  O  (log(N ))  parallel  operations  for  solution,  but  the  constant  multiplying  the  log(V)  is 
much  smaller  than  before  because  of  more  rapid  convergence  of  the  solution  which  therefore 
requires  less  iterations  to  reach  a  desired  level  of  accuracy.  This  is  accomplished  by  solving 
many  coarse  grid  problems  simultaneously,  combining  their  results  to  provide  an  optimal  finer 
grid  approximation.  No  extra  computation  time  is  involved  (if  N  processors  are  available) 
since  the  extra  coarse  grid  problems  are  solved  on  processors  which  would  otherwise  have 
been  idle. 

The  algorithm  PSMG  (Parallel  Superconvergent  Multigrid)  that  we  describe  in  section  2 
uses  an  interpolation  scheme  which  we  term  super-interpolation  to  speed  convergence.  A 
variant  uses  a  projection  called  super-projection  as  an  alternative  to  super-interpolation.  We 
analyze  the  convergence  of  PSMG  as  a  two-grid  method  in  section  3.  Numerical  multigrid 
examples  involving  elliptic  operators  on  rectangular  grids  are  developed  in  section  4.  These 
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examples  demonstrate  the  power  of  the  PSMG  algorithm.  We  note  that  in  these  examples  the 
relevant  operators  are  singular,  and  that  PSMG  then  actually  implements  the  Moore-Penrose 
pseudoinverse[14, 15].  We  show  elsewhere[16]  that  the  PSMG  method  converges  for  a  wide 
class  of  operators,  even  with  only  one  smoothing  operation  per  iteration.  In  some  situations  it 
reduces  to  an  exact  (direct)  solver.  Furthermore  bounds  on  the  convergence  rate  are  derived 
in[16]  that  are  extremely  sharp  -  for  example  in  some  cases  an  upper  bound  for  the  multigrid 
convergence  rate  is  within  a  few  percent  of  the  suprenum  of  the  two-grid  convergence  rate 
taken  over  all  grid  sizes.  In  this  paper  we  simply  illustrate  the  method  by  providing  numerical 
evidence  for  the  high  convergence  rate. 

Most  of  this  paper  will  deal  for  simplicity  with  periodic  boundary  data.  We  remark  that 
the  method  is  not  restricted  to  periodic  boundary  conditions.  We  have  obtained  similar  conver¬ 
gence  rates  for  Dirichlet  and  Neumann  data  by  using  anti-reflection  or  reflection  boundary  con¬ 
ditions  in  an  extended  periodic  domain.  Convergence  rates  for  Dirichlet  and  Neumann  prob¬ 
lems  are  never  worse  than  for  the  periodic  problem. 

We  have  also  used  the  method  successfully  with  non-constant  coefficient  problems.  In 
the  case  of  slowly  varying  coefficients  the  smoothing  operator  may  be  chosen  locally  to  match 
the  coefficients  of  the  equation  to  be  solved.  Further  studies  are  planned  to  explore  anisotropic 
problems,  as  well  as  those  with  discontinuous  coefficients  or  other  singularities.  Some  prelim¬ 
inary  results  for  three-dimensional  problems  are  presented  in  section  4.2.3. 


2.  THE  PSMG  ALGORITHM 

2.1.  The  Basic  Idea 

Consider  a  simple  discretization  problem  on  a  1 -dimensional  grid.  Standard  multigrid 
techniques  work  with  a  series  of  coarser  grids,  each  obtained  by  eliminating  every  other  point 
of  the  previous  grid.  The  error  equation  for  the  fine  grid  is  then  projected  to  the  coarse  grid  at 
every  second  point,  the  coarse  grid  equation  is  solved  approximately,  and  the  error  is  interpo¬ 
lated  back  to  the  fine  grid  and  added  to  the  solution  there.  Finally  a  smoothing  operation  is 
performed  on  the  fine  grid.  Recursive  application  of  this  procedure  defines  the  complete  mul¬ 
tigrid  procedure[l,  3]. 
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The  basic  idea  behind  PSMG  is  the  observation  that  for  each  fine  grid  there  are  two 
natural  coarse  grids  -  the  even  and  odd  points  of  the  fine  grid.  (For  simplicity  we  assume  that 
periodic  boundary  conditions  are  enforced).  Either  of  these  coarse  grids  could  be  used  at  any 
point  to  construct  the  coarse  grid  solution,  and  both  would  presumably  provide  approximately 
equivalent  quality  solutions.  Why  not  try  to  combine  both  of  these  coarse  grid  solutions  to  pro¬ 
vide  a  fine  grid  correction  that  is  better  than  either  separately?  This  should  be  possible  since  in 
projecting  from  the  fine  grid,  the  odd  and -even  points  receive  slightly  different  data  in  general, 
and  thus  each  represents  slightly  complementary  views  of  the  fine  grid  problem  to  be  solved. 
Thus  it  ought  to  be  possible  to  find  a  combination  of  the  two  solutions  that  is  significantly 
better  than  either  separately.  It  would  follow  immediately  that  such  a  scheme  would  converge 
faster  (fewer  iterations)  than  the  corresponding  standard  multigrid  scheme.  As  a  concrete 
example,  if  the  combination  of  coarse  grid  solutions  is  simply  the  arithmetic  average  of  the  two 
standard  coarse  grid  interpolation  operators,  then  the  algorithm  would  converge  at  least  as  well 
as  the  usual  multigrid  algorithm  since  the  convex  combination  of  two  (iteration)  operators  has 
norm  bounded  by  the  larger  of  the  norms  of  the  two  operators.  Note  that  on  a  massively  paral¬ 
lel  machine  the  two  coarse  grid  solutions  may  be  solved  simultaneously,  in  the  same  time  as 
one  of  them  would  take  -  we  assume  here  that  the  number  of  processors  is  comparable  to  the 
number  of  fine  grid  points.  As  will  be  seen  below,  both  coarse  grid  problems  are  solved  using 
the  same  set  of  machine  instructions.  Consequently  the  algorithm  is  well  suited  to  SIMD  paral¬ 
lel  computers,  as  well  as  to  MIMD  machines.  On  machines  with  more  modest  numbers  of  pro¬ 
cessors  it  may  still  make  sense  to  switch  from  standard  MG  to  PSMG  at  grid  levels  such  that 
the  number  of  grid  points  is  less  than  the  number  of  processors. 

The  idea  outlined  above  extends  naturally  to  multi-dimensional  problems.  In  d  dimen¬ 
sions,  2d  coarse  grids  are  obtained  from  a  fine  grid  by  selecting  either  the  even  or  the  odd 
points  in  each  of  the  d  coordinate  directions.  The  fine  grid  solution  is  then  defined  by  perform¬ 
ing  a  suitable  linear  interpolation  of  all  2d  coarse  grid  points. 

2.2.  Multigrid  Notation 

Suppose  we  are  required  to  solve  a  discrete  algebraic  equation  A^u  =f  on  a  rectangular 
grid  G(L^  with  grid  spacing  or  scale  hi  =  2~Lh.  We  assume  that  the  operator  A (L )  has  natural 
scale  hL  as  would  be  true  for  a  difference  operator  on  G(L\  We  introduce  a  spectrum  of  opera- 
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tors  A^\  l  =0,1  ,■••/.,  each  defined  on  all  of  G^L)  and  of  scale  hi  =2 ~lh.  Starting  from  an 
initial  guess  u  on  we  construct  the  residual 

r  =  f  -A^u  =  A^e  ,  e  =  u-u  , 

where  u  is  the  exact  solution  and  e  is  the  error.  We  will  use  the  residual  to  construct  an 
improved  solution  u  of  the  form: 

u  =  u  +  F^r  , 

where  is  a  linear  operator  on  G(L\  This  results  in  a  new  error 

e'  =  u  -u'  =(/  -F^A^)e  , 

and  a  new  residual 

r  =A^V  =  (/  -A^F^r  . 

Convergence  of  the  above  procedure  will  be  guaranteed  provided  that  is  an 

^.-approximate  inverse  of  A&\  i.e.  if 

I  I  /  _A(Z.)/r(t)  I  I  <e<  1  . 

This  condition  suffices  to  ensure  that  the  residual  r  is  reduced  in  norm  by  a  factor  of  at  least  E 
per  iteration.  Provided  that  A^  is  invertible,  it  follows  that  the  error  converges  to  0  too  since 
e  -  r.  If  A  is  singular,  the  above  iteration  may  still  converge  to  the  solution  of  the 

equation,  provided/  is  in  the  null  space  of  A^L\ 

2.3.  The  Two-Grid  PSMG  Algorithm 

In  the  two-grid  PSMG  algorithm,  we  approximate  the  error  e  by  the  exact  solution  e  of 
the  coarse  scale  equation: 

^(L-Dg'  =  r 

Note  that  since  is  by  fiat  defined  on  all  of  GL ,  it  follows  that  the  error  equation  is  being 

solved  on  the  fine  grid,  which  may  be  regarded  as  the  union  of  a  set  of  coarse  grids.  It  is  for 
this  reason  that  we  prefer  the  name  multiscale  rather  than  multigrid  as  a  description  of  the 
algorithm.  Having  said  this,  we  will  lapse  frequently  in  the  sequel  into  the  more  familiar  use  of 
the  word  coarse  grid  rather  than  coarse  scale ! 

Next  we  will  combine  the  multiple  coarse  grid  solutions  defined  by  e  into  a  fine  grid 
correction  e"  by  applying  a  linear  inteipolation  of  the  form: 
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e"  =  Q^e  , 

leading  to  an  improved  fine  grid  solution: 

«  // 

u  =  u  +e 

As  in  standard  multigrid  procedures  the  final  step  involves  a  smoothing  operation  on  the 
fine  grid: 

u"  =  S(L\u'\f)  , 

=  (/  +  Z<«/  . 

By  suitably  choosing  QdA  and  Z^L  \  the  above  procedure  should  lead  to  convergent  solu¬ 
tions.  In  particular  our  strategy  will  involve  choosing  pairs  Qd-),  Zd-)  which  optimize  the  con¬ 
vergence  rate  of  the  algorithm  for  given  A  dA. 

We  note  that  the  two-grid  PSMG  algorithm  may  be  described  in  the  form: 

e(T)  =  e~  =  T(L)e  =  (/  -  T(L)A(L))e  f 

where  the  two- grid  iteration  operator  TdA  sI-T^A  d-)  is  determined  by: 

7«->  =  ZdA  +  (/  -ZdA/tdA)  . 

We  define  the  two-grid  convergence  rate  t  of  this  iteration  procedure  as  the  quantity: 

T  =  I  ITdA|  I  . 

Clearly  t  provides  a  bound  on  the  converge  rate  per  iteration  of  the  two-grid  method  on  any 
grid. 

We  have  assumed  here  that  A  d)  are  invertible.  If  the  operators  A d)  are  singular  this  equa¬ 
tion  has  an  appropriate  interpretation  in  terms  of  the  Moore-Penrose  pseudo-inverse  Ad>*  of 
Ad)  [14, 15]  which  we  will  not  consider  here  in  detail.  For  the  cases  of  interest  in  this  paper 
we  will  suppose  that  singular  operators  such  as  the  Laplacian  with  periodic  boundary  condi¬ 
tions  are  regularized  by  addition  of  a  small  diagonal  operator.  The  convergence  rates  discussed 
below  will  then  be  uniform  bounds  as  the  diagonal  regularization  is  reduced  to  0.  For  the 
remainder  of  the  paper  we  will  assume  such  regularization  has  been  performed  for  all  singular 
operators. 

Instead  of  simply  projecting  the  fine  grid  residuals  to  the  coarse  grid,  a  variant  applies  the 
operation  Q  to  transfer  the  residuals,  (we  call  this  a  super-projection)  and  in  that  case  the  inter- 
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polation  of  the  coarse  grid  errors  back  to  the  fine  grid  is  performed  by  simple  injection.  The 
only  effect  this  has  on  the  equations  above  is  to  reverse  the  order  of  A(L~U  and  Q^L\  In  the 
commuting  constant  coefficient  case  this  in  fact  gives  the  same  algorithm,  though  the  algo¬ 
rithms  are  different  in  general.  More  generally  one  could  perform  both  non-trivial  projection 
and  interpolation  operations. 

2.4.  The  Recursive  PSMG  Algorithm 

We  obtain  the  full  PSMG  algorithm  by  recursive  application  of  the  two-grid  algorithm 
described  above.  The  corresponding  error  correction  then  takes  the  form: 

em  =  M(f)e  =  (/  , 

where  the  multi-grid  iteration  operator  )  =  I  -M  U  (0  js  determined  by: 

=  +  -Z('U(/))Q(OAf('-i)  ,  l=L,  ■  ,1  , 

with  M =  A (0)_1.  We  define  the  multigrid  convergence  rate  of  this  procedure  as  the  quantity: 

|i.  =  mp  I  IM(')|  I  . 

Clearly  p  provides  a  bound  on  the  convergence  rate  of  PSMG  on  any  grid.  We  show  else- 
where[16]  that  the  PSMG  iteration  defined  above  converges  for  a  wide  class  of  operators,  even 
with  only  one  smoothing  operation  per  iteration.  Furthermore  bounds  on  the  convergence  rate 
p.  are  derived  that  are  extremely  sharp.  The  same  comments  apply  for  the  case  of  singular  A 
as  were  made  at  the  end  of  the  two  grid  discussion. 

3.  FOURIER  MODE  ANALYSIS 

In  order  to  complete  the  description  of  the  algorithm  it  is  essential  to  define  the  operators 
Q(l)  and  used  for  interpolation  and  smoothing.  In  this  section,  we  describe  how  to  choose 
Q(l~>  and  Z<ri  in  an  optimal  way  for  the  special  case  of  an  operator  which  has  translation  invari¬ 
ant  coefficients.  We  will  illustrate  the  ideas  for  the  Poisson  equation  discretized  on  a  periodic 
rectangular  grid  G(L>  of  N  =nxn  points,  n=  2L,  which  we  label  with  the  index  i  =(1*1,12), 
O^i’i,  *2  <  n.  We  will  use  two  discretizations  of  the  negative  Laplacian  -A  in  our  analysis. 
The  first  of  these  is  the  standard  five-point  discretization  defined  by 
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(A^u)i  =  hr2(4ui  ~Ui_e[  -ui+e[  -  -Ki+ei  )  , 


where  e{  are  integer  vectors  of  length  dt  =  2L~l  in  the  coordinate  directions  in  index  space,  or 
alternatively  by  the  familiar  five-point  star  notation: 


AP 


hr 2 


-l 

-1  4  -1 
-1 


The  second  discretization  we  will  study  is  the  more  accurate  Mehrstellen  discretization 
represented  by  the  nine-point  star 


AP  =  (6/t;2)-1 


-1  -4  -1 
-4  20  -4 
-1  -4  -1 


Similarly,  we  will  choose  the  operators  and  to  be  defined  by  simple  symmetric  three 
parameter  nine-point  star  operators  (with  appropriate  scale  length): 


II 

Ol 

<7n  Q l  <7 li 
q i  <7o  <7i 

,  z<0  =  h,2 

Z11  Zi  z  11 
Z1  zo  Zj 

_ 1 

Zll  Zi  Zji 

or  equivalently  in  operator  notation: 

( Q(1)u)i  =  Qoui  +  <7i  (ui+e[  +M,_el  +Mi+e4  +«,_ei)  + 


(Z<Ou  )i  =  /iz2  (ZoUi  +  Zi  (ui+e{  +  +  Ui+ei  +  + 

zll  (Mi+e{  +e{  "P  ui-e{  +ei  ^i+e {-ei  ^i-e{ -e{  ))  • 

For  simplicity,  we  take  the  parameters  qt  and  z,  to  be  independent  of  the  scale  parameter  / . 

Since  all  of  these  operators  are  translation  invariant,  they  are  diagonalized  by  the  discrete 
Fourier  transform.  The  analysis  of  the  PSMG  algorithm  then  becomes  particularly  convenient. 
In  the  following  we  work  entirely  in  Fourier  transform  space,  where  each  of  the  operators  A  (l\ 
Qd)  and  Z(/)  will  reduce  to  multiplication  by  a  trigonometric  function.  In  terms  of  the  two- 
dimensional  discrete  Fourier  transform  on 

*4  =  n_1  ^  e,2nj'kln  u;  ,  0<k\,lc2<  n  , 

>w*=o 


the  operators  A  d)  and  Q reduce  to  multiplication  by  the  trigonometric  polynomials: 
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A =  hr2  (A-2cos(2nk\di/n)-2cos(2Kk2d[/n))  , 

=  (6h,2)~l  (20  -  8cos(2jc k  id, In )  -  8cos(2rtM//« ) 

-  2cos(2.n(ki+k2)d,ln)  -  2cos(2n(k \-k2id1ln ) )  , 

Q(/)*  =  <7o  +  24i(cos(2jc£id//rt)  +  cos(2ji&2^/rt))  + 

2 q i i(cos(2n(Jk i+k 2 )d, ln)  +  cos(2rt {k \-k 2)d, In ))  , 
with  a  similar  expression  for  Z^l\  in  terms  of  parameters  zq,  z 1  and  zu. 

With  the  notation  x,d)  =  cos(2nk,d,ln)  ,  i  =  1 ,2  ,  and  observing  that 
cos(2 Tt{k  1 +k 2 )d, In )  +  cos(27t(fc  1  -k 2 )d, In)  =  2xpxP  , 
these  expressions  simplify.  For  example,  the  two  discretizations  of  -A  become 
A#)  -  hr2  (4  -  2(x{/>  +  xp))  , 

A P  =  (6h/2)-1  (20  -  8(jc  P  +xp)-4x pxP )  . 

Application  of  the  two-grid  iteration  operator  T d)  =  I -T(l'>A<~l')  over  the  grid  G(L)  reduces  to 
multiplication  by  the  function: 

T(')k=S<OtC<'>*  , 

where  =  1  —Z^\A^n  and  =  1  -  are  the  Fourier 

representations  of  the  smoothing  and  coarse-scale  correction  operators,  respectively. 

The  kernel  of  the  smoothing  operator  is  given  by: 

S<‘\  =  1  -h,2  (z0  +  2zx(xP+xP)  +  4znx[l'>xP)  . 

For  the  five-point  discretization  A  P  we  obtain 

C<')*  =  l -2(q0  +  2qi(xP+xP)  +  4quxPxP)  _^ijl  » 

while  for  the  nine-point  discretization  AP  we  have 

C®>,  -1  -2(»0  +  2,1Cf<)+rf»)  +  ‘H..xPVf))  ■ 

In  either  case  has  apparent  poles  at  the  four  points  1  =  \xP  I  =  1,  which  of  course 
are  the  zeroes  of  the  coarse  grid  difference  operator.  The  pole  at  =  xp  =  1  is  canceled  by 

a  corresponding  zero  in  the  numerator,  but  this  is  not  so  for  the  other  three  poles.  We  cancel 
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the  remaining  zeroes  in  the  denominator  by  carefully  choosing  the  three  parameters  qt .  It  is 


easily  checked  that  the  two  conditions: 

qi=qo/2  ,  qn=qol4  , 

suffice,  leaving  one  free  parameter  q$  in  the  Q  operator  to  be  chosen  later.  The  resulting  form 
of  Cti)  is  then: 

C«)t  =  1  -2«0 <l+xfO+xf>+*fO*£))  . 


for  the  five-point  operator,  with  a  similar  expression  for  the  nine-point  operator.  Note  that  the 
same  restrictions  on  Q  are  required  for  the  five  and  nine-point  operators. 

One  further  constraint  is  necessary  if  the  multi  grid  iteration  operator  I -MU'* A is  to  have 
a  bound  independent  of  /,  namely  the  constraint  that  the  coarse  grid  correction  operator  C(/) 
vanish  at  the  origin  in  frequency  space,  see[16]  for  the  full  explanation.  This  constraint  leads 
to  the  very  simple  condition  <7o=.25  on  the  interpolation  operator  Q(l)  which  is  therefore 
uniquely  determined. 

The  final  step  in  the  two-grid  analysis  is  to  choose  the  parameters  z,  so  as  to  minimize  the 
two- grid  convergence  rate  x  =  si^p  I  I  T^'l  I .  Since  TW>  is  a  multiplication  operator,  its  norm  is 

the  maximum  value  of  IT^I  evaluated  over  all  relevant  frequencies  ,  or  equivalently, 
evaluated  over  those  discrete  points  in  the  square  -lSxp  ,  xP  £1  that  correspond  to 
Fourier  frequencies  on  the  grid  G(Ll  Evaluation  of  this  maximum  is  a  strictly  algebraic  optim¬ 
ization  problem,  and  in  fact  TP  is  a  quotient  of  low-degree  polynomials  in  the  xp\  By  max¬ 
imizing  over  continuous  variables  xt  in  the  unit  square,  rather  than  over  the  discrete  set  xp  we 
arrive  at  bounds  that  are  uniform  in  both  /  and  L .  This  uniform  norm  estimate  may  then  be 
minimized  by  varying  the  parameters  z,- . 


To  get  an  improved  convergence  rate  we  have  also  tried  using  a  25-point  star  operator  to 


define  Q : 


Q  = 


<?22  <712  q 2  <7 12  <722 
<7 12  <7n  <7 1  <7 li  <7i2 
q 2  q l  <70  <7l  <72 
<7 12  <7n  q l  <7 u  <712 
<722  <7 12  <72  q  12  q 22 1 
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Again  we  compute  an  explicit  rational  function  expression  for  the  coarse-scale  operator  as 
a  function  of  the  trigonometric  variables  xP.  As  before,  there  are  three  poles  of  the  denomina¬ 
tor  in  CP  which  we  cancel  by  careful  choice  of  Q ,  leading  to  the  two  constraints: 

0  =  ^o  ~4<7i +4^n  +  4^2  +  4^22“  8<7i2  , 

0  =  <70-4<7ii +  4<72  + 4*722  • 

It  follows  that  there  are  4  independent  coefficients  of  Q ,  along  with  the  3  parameters  of  Z ,  that 
are  available  to  minimize  t  in  the  two-grid  analysis.  In  the  multigrid  analysis  we  have  one 
fewer  free  parameter  than  for  the  two-grid  case,  for  we  must  add  the  constraint 

0  =  «7o  +  4<7i  +  4^u +  4^2  + 4^22  +  8^12  . 

required  to  ensure  the  the  coarse  grid  correction  operator  Cd>  vanishes  at  the  origin  in  fre¬ 
quency  space. 


4.  NUMERICAL  EXAMPLES 

4.1.  Two-grid  Convergence 

We  have  estimated  uniform  bounds  for  the  norm  of  the  two-grid  iteration  operator 
I  ITd)|  1  by  computing  the  maximum  of  IT|ri|  over  a  fine  rectangular  1000x1000  grid  of 
values  of  the  trigonometric  functions  x , .  Since  the  polynomials  involved  are  of  low  order  the 
exact  location  of  the  maximum  can  probably  be  found  explicitly,  but  is  not  likely  to  change  the 
estimate  significantly.  The  above  bound  was  computed  numerically  within  a  numerical  optimi¬ 
zation  procedure  that  itself  sought  the  minimum  of  the  bound  over  the  parameters  <7,  and  z;  by 
using  a  variant  of  a  bisection  method  in  each  parameter.  It  is  not  essential  to  locate  the  exact 
optimal  values  of  the  parameters  since  a  small  error  results  in  an  almost  optimal  convergence 
rate.  Having  located  a  reasonable  estimate  of  the  optimal  parameters,  one  can  then  perform  a 
more  careful  estimate  of  the  uniform  bound  on  I  IT(<)|  |  for  that  choice,  for  example  by  max¬ 
imizing  over  a  very  large  grid  of  Xi ,  or  perhaps  by  explicit  analytic  estimates. 

In  the  case  of  the  five-point  discretization  of  the  Laplacian  with  a  nine-point  star  Q  opera¬ 
tor,  there  are  as  we  have  seen,  four  free  parameters:  qo  and  z,-.  The  optimization  procedure  in 
that  case  led  to  the  optimal  parameter  choice: 
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<70  =  .265840  , 

z0  =  . 257070,  zt  =  . 041304,  zn  =  .006308  . 

For  this  choice  of  parameters  we  find  numerically  that 

x  =  sup  m^x  IT^)|  <  .06329  , 

which  provides  a  reasonable  two-grid  convergence  rate  for  the  method.  When  we  take  to 
be  a  25-point  star  operator  (i.e.  a  5x5  star)  we  find  as  the  optimal  choice,  the  same  Z  parame¬ 
ters  as  above  along  with: 

<70  =  . 375881  ,  <7]  =  .109816,  <7 2  = -.0374773  , 

<7  n  =  .0639978  ,  <712  =  .0090898  ,  <7  22  =  .00750485  , 

leading  to  the  numerical  estimate  of  the  two-grid  convergence  rate: 

x  <  .025  . 

In  the  case  of  the  Mehrstellen  discretization  A  $0  with  nine-point  star  Q  and  Z  we  choose 
the  interpolation: 

.0625  .1250  .0625 
Q  =  .1250  .2500  .1250 
.0625  .1250  .0625 

and  the  optimization  leads  to  the  smoothing  operator: 

.0156655  .0464891  .0156655 
Z  =  hi2  .0464891  .3059000  .0464891  , 

.0156655  .0464891  .0156655 

producing  a  two-grid  convergence  rate  (determined  numerically  as  described  previously)  of 
t=.0261  per  iteration  for  V  cycles.  We  note  that  the  Q  above  satisfies  the  extra  constraint  men¬ 
tioned  previously  ( q0  =  .25)  as  required  for  multigrid  convergence. 

4.2.  Multigrid  Convergence 

In[16]  we  prove  that  a  rigorous  upper  bound  on  the  multigrid  convergence  rate  of  the 
PSMG  algorithm  is  given  by  the  expression: 

jx*  =  syjjmgx  IT p I  (1  -  ITi'M  _  lSi'M)| 


Frederickson  and  McBryan 


207 


This  expression  may  be  estimated  using  exactly  the  same  techniques  described  above  for 
estimating  the  two-grid  convergence  rate.  In  fact  the  expression  p*  may  be  used  as  the  basis 
for  an  optimization  calculation  for  the  parameters  qt  and  z,  instead  of  basing  the  optimization 
on  the  two-grid  convergence  rate  x  =  sifp  I  IT^l  I  I  as  discussed  above.  The  resulting  Q  and  Z 

operators  are  then  guaranteed  to  provide  multigrid  convergence  rates  of  at  least  p*  in  all  situa¬ 
tions.  In  some  situations,  optimizing  the  two-grid  convergence  rate  x  leads  to  less  than  optimal 
multigrid  convergence,  and  in  some  cases  may  actually  lead  to  a  divergent  multigrid  scheme. 
Conversely,  the  optimal  parameter  choice  for  multigrid  convergence  is  generally  not  optimal 
for  the  two-grid  convergence.  However,  we  have  observed  that  whenever  p*  is  small,  p*  «1, 
then  x  and  p*  are  very  close. 

We  have  performed  optimizations  based  on  the  quantity  p*  for  both  the  five-point  and 
nine-point  Mehrstellen  discretizations,  as  well  as  for  the  standard  seven-point  discretization  in 
three  dimensions.  We  present  several  of  the  results  obtained  below.  We  refer  to[16]  for 
further  details  as  well  as  for  the  complete  discussion  of  the  multigrid  convergence  rate. 

4.2.1.  Five-point  Convergence  Rates 

For  the  case  of  the  five-point  discretization  with  nine-point  star  Z  and  Q ,  optimization  of 
the  above  bound  p*  leads  to  a  rigorous  multigrid  convergence  rate  bound  of  approximately 
.2115  (with  a  corresponding  two-grid  rate  of  .1486)  for  the  case  of  V-cycles  with  one  smooth¬ 
ing  per  iteration  level.  As  we  have  seen,  the  3x3  Q  matrix  is  fully  determined.  The  optimal  Z 
matrix  in  this  case  is  defined  by  the  parameters 

z0=.3 11393,  z  i=.0761 886,  z,  ,=.0249449  . 

Improved  results  are  obtained  with  a  25-point  star  Q ,  where  we  obtain  a  multigrid  conver¬ 
gence  rate  of  .0831  (with  two-grid  rate  of  .0632)  for  the  parameter  choice: 

<7o=.391397,  <7, =  11 1803,  (?2=-.04 13862, 

<7  ,,=.0625,  <7,2=.00659854,  q2 2=.00603699, 


z0=.322645, 


z  ,=.0857 152, 


z, ,=.0308 174  . 
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4.2.2.  Mehrstellen  Convergence  Rates 

Convergence  rates  for  the  Mehrstellen  discretization  are  dramatically  sharper.  With 
nine-point  star  Q  and  Z  operators,  we  obtain  a  multigrid  convergence  rate  bound  p*  of  .02754 
(with  two-grid  bound  of  .02609)  for  single-smoother  V-cycles.  This  bound  is  obtained  with  the 
choice: 

zq=.3059,  zi=.0464891,  z2=.0 156655  . 

By  combining  a  3x3  Z  operator,  a  5x5  Q  operator  and  the  Mehrstellen  operator,  we  have  con¬ 
structed  a  PSMG  scheme  for  the  Poisson  equation  which  has  a  two  grid  convergence  rate  x  of 
.00434  and  has  .00446  as  an  upper  bound  on  its  multigrid  convergence  rate  |i*.  The 
corresponding  optimal  parameters  are: 

<?o=.341997,  <7 1=.0972999,  <72=-.0175355, 

<?11=0625,  <7i2=.0138501,  <722=-.00546389, 

z0=.337042,  z  !=.0629468,  zx  ,=.0245344  . 

Faster  convergence  rates  may  be  obtained  by  using  more  than  one  smoothing  operation 
per  level.  For  example,  by  using  two  smoothing  steps  per  level,  we  obtain  a  multigrid  conver¬ 
gence  rate  bound  p*  of  .0013,  for  which  the  optimal  parameter  choice  is: 

<7o=.339308,  <7  i=.0976648,  <72=-.0168118, 

<7n=-0625,  <7i2=.0136676,  <722=-.00551516, 

z0=.351804,  zi=.0739205,  zn=0322007  . 

This  estimate  is  obtained  by  replacing  by  its  square  both  in  the  expression  for  p* ,  and  in 
evaluating  within  that  expression. 

4.2.3.  Three  Dimensional  Convergence  Rates 

Finally  we  mention  that  the  same  techniques  extend  to  three-dimensional  operators.  As 
an  illustration,  we  consider  the  standard  7-point  discretization  of  the  Laplacian  with  3x3x3  Z 
and  Q  matrices.  Applying  appropriate  symmetry  constraints,  each  of  Z  and  Q  can  be 
described  by  four  parameters.  Following  the  notation  introduced  in  the  two-dimensional  case, 
we  denote  the  four  parameters  of  Q  by  <70,  <?i,  <7n  and  q m,  with  a  similar  notation  for  the 
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parameters  of  2.  Here  i?lu  for  example  refers  to  a  term  involving  connection  to  a  grid  neigh¬ 
bor  one  unit  of  scale  distance  di  away  in  each  of  the  three  coordinate  directions.  The  require¬ 
ment  of  canceling  the  poles  of  the  coarse  grid  operator  inverse  in  the  two-grid  iteration  opera¬ 
tor  T,  introduces  three  constraints  for  Q ,  while  the  requirement  that  the  coarse  grid  operator 
vanish  at  the  origin  in  frequency  space  (required  for  multigrid  convergence)  implies  one  further 
constraint.  With  these  constraints  Q  is  fully  determined,  with  parameters: 

<70=0-125,  <7!=0.0625,  <7n=0.03125,  <7m=0.015625  . 

Thus  we  may  optimize  only  over  the  four  Z  parameters,  and  we  obtain  an  optimal  multigrid 
convergence  rate  bound  of  .3313  (with  a  two-grid  bound  of  .1965).  The  corresponding  optimal 
Z  parameters  are  then: 

zo=.206492,  zi=.0427567,  zu=0141576,  zm=0045795  . 

We  have  obtained  substantially  faster  convergence  rates  for  3D  versions  of  the  Mehrstellen 
operator,  see[16]  for  details,  and  by  using  more  than  one  smoothing  operation  per  grid  level. 
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Multiblock  Multigrid  Solution  of  the 
Implicit  Time  Advance  Equations 
for  Magnetic  Resistive  Diffusion  in 
Geometrically  Complex  Regions 

M.  H.  Frese 
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INTRODUCTION 


A  theorist  who  wishes  his  computations  to  be  of  maximum  value  to  his  experi¬ 
mental  colleagues  must  do  problems  in  real  experimental  geometries.  Further, 
he  must  be  prepared  to  repeat  calculations  with  variations  in  geometry  and 
physical  boundary  conditions  to  investigate  the  effects  of  the  changes  which 
the  experimenter  can  most  easily  make.  Since  solutions  in  idealized  geome¬ 
tries  may  not  exhibit  the  same  behavior  as  experiments,  the  ability  to  handle 
a  wide  variety  of  complex  geometries  is  a  key  requirement  for  a  finite  differ 
ence  code.  The  multi  block  approach  is  a  way  of  doing  finite  difference  calcu 
lations  in  complex  geometry  with  the  minimum  level  of  code  complexity. 


1.  OVERVIEW  OF  THE  MULT  I BLOCK  APPROACH 

A  multiblock  calculation  is  essentially  a  collection  of  separate,  logically 
rectangular,  finite  difference  calculations  coupled  together  by  boundary  con¬ 
ditions.  The  calculation  in  each  block  is  carried  out  on  a  numerically  gener 
ated  grid  that  matches  the  grid  of  its  neighbor  blocks  in  a  two-cell  wide 
overlap  region  at  its  edges.  Because  the  cells  of  the  grid  may  be  arbitrary 
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quadrilaterals  and  the  grid  numerically  generated,  calculations  may  be  done  in 
quite  complex  regions  of  physical  space.  The  blocks  function  as  macro¬ 
elements  in  a  scheme  somewhat  like  a  finite  element  scheme.  In  the  multiblock 
approach,  however,  there  are  relatively  few  blocks,  each  composed  of  a 
moderate  to  large  number  of  cells  on  which  finite  difference  calculations  are 
performed.  Figure  1  shows  a  region  of  two-space  decomposed  as  five  blocks. 
While  this  region  could  be  decomposed  as  two  blocks,  for  simplicity  of  the 
numerical  algorithms,  it  is  advantageous  to  require  corners  of  blocks  to  meet 
other  blocks  only  at  their  corners. 

In  the  multiblock  data  structure  the  coordinate  system  is  described  by  a 
collection  of  pairs  of  arrays 


{(xjj.  y]j)li«i,...,i 


max 


(1) 


where  the  sizes  of  all  the  arrays  need  not  be  the  same,  but  are  given  by 


i=l,...,imax-| 

j=l,...,jmax^ 


(2) 


Further,  each  spatially  dependent  physical  quantity  i|>  is  given  by  a  similar 
collection  of  arrays 


l=l,...,lmax 


(3) 


with  a  similar  range  of  indices.  Thus,  ^  is  known  as  a  function  of  the 
physical  coordinates  (x,y)  only  parametrically. 

The  key  idea  of  the  multiblock  approach  is  to  use  an  additional  row  of 
cells  around  each  block  to  couple  neighbor  blocks  and  to  implement  boundary 
conditions.  These  cells  are  referred  to  as  ghost  cells,  since  the  data  they 
contain  is  secondary  to  that  in  the  interior  of  the  blocks.  The  data  in  the 
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other  cells,  referred  to  as  real  cells,  is  determined  by  time  Integration  of 
difference  equations,  but  the  ghost  cell  data  is  determined  from  the  present 
values  of  real  cell  data  by,  at  most,  simple  geometric  computations. 

The  computations  in  neighbor  blocks  are  coupled  by  using  the  ghost  cells 
to  represent  the  neighboring  blocks.  So,  where  a  boundary  of  block  A  abuts 
that  of  another  block  B,  data  from  B's  edge  cells  is  copied  into  the  ghost 
cells  of  A  along  that  boundary.  This  is  done  immediately  before  that  data  is 
to  be  used  in  block  A.  Since  this  is  done  for  the  values  of  the  iteration 
variables  at  each  step  of  any  iteration,  and  since  each  iteration  step  is  com¬ 
pleted  for  all  cells  of  all  blocks  before  another  iteration  step  is  begun, 
block  boundaries  interior  to  the  computational  region  do  not  inhibit  con¬ 
vergence.  In  fact,  with  few  exceptions,  the  data  flow  is  designed  so  that 
splitting  a  single  block  into  two  or  more  blocks  causes  no  change  in  the 
results  of  the  calculations.  Exceptions  to  this  rule  are  necessary, 
especially  at  block  corners,  but  only  those  which  produce  errors  no  larger 
than  truncation  error  are  permitted. 

Boundary  conditions  are  required  on  block  edges  which  do  not  abut  other 
blocks.  These  are  applied  by  setting  ghost  cell  values  which  obey  the  desired 
boundary  condition.  For  example,  to  implement 


rBg  =  CONSTANT 


FIG.  1.  A  typical  multiblock  problem.  Boundaries  not  otherwise  designated 
are  perfect  conductors. 
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the  ghost  cells  are  made  by  reflecting  the  first  row  of  interior  vertices 
through  the  tangent  to  the  outer  boundary  formed  from  the  edge  vertices,  and 
the  real  values  of  f  adjacent  to  the  boundary  are  copied  across  the  boundary 
into  the  ghost  cells.  To  force 


f  =  0  on  3R  (5) 

the  same  procedure  is  followed,  except  that  the  sign  of  the  value  of  f  in  the 
real  cell  is  changed  before  it  is  copied  into  the  ghost  cell. 

The  principle  advantages  of  the  multiblock  approach  derive  from  the  fact 
that  it  applies  to  a  formal  class  of  geometrically  complex  problems  each  com¬ 
posed  of  logically  simple  pieces.  Hence,  only  a  small  amount  of  information 
must  be  given  to  describe  any  given  problem,  and  the  entire  class  can  be 
described  by  a  simple  formal  input  language.  Then,  since  the  code  deals  only 
with  the  the  individual  blocks,  it  may  easily  be  made  correct.  Errors  elimi¬ 
nated  in  one  geometric  context  will  not  be  reintroduced  in  another  context  by 
the  code  modifications  that  other  approaches  require  to  change  geometry.  In 
addition  there  are  no  dead  regions  within  any  block;  only  the  ghost  cells  are 
inactive.  Since  the  conditional  statements  required  for  dead  regions  may  be 
eliminated  from  the  loops  containing  finite  difference  code,  vectorizable  code 
is  easier  to  obtain. 

The  composite  mesh  method  of  Henshaw  and  Chessire  [1]  is  a  another 
approach  to  the  problem  of  doing  finite  difference  calculations  in  real  geome¬ 
tries.  It  differs  from  the  multiblock  method  by  allowing  meshes  to  overlap  in 
the  interior  of  logical  blocks,  by  using  different  difference  equations  on  the 
boundary  than  in  the  interior  of  the  blocks,  and  by  having  dead  regions  in  the 
Interior  of  the  finite  difference  grid  at  which  the  results  of  the  calcu¬ 
lations  are  Ignored. 

Though  not  all  numerical  algorithms  may  be  well  suited  to  multi  block 
application,  the  Full  Approximation  Storage  multigrid  algorithm  can  be  imple¬ 
mented  in  a  multiblock  architecture  to  give  exceptional  flexibility  and  power. 
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For  concreteness,  the  problem  of  magnetic  diffusion  Is  used  below  to  demon¬ 
strate  how  this  may  be  done,  but  It  should  be  clear  that  any  of  a  wide  variety 
of  physical  processes  could  be  treated  similarly. 


2.  THE  PHYSICAL  PROBLEM 

Maxwell's  Equations  for  electromagnetic  phenomena  can  be  reduced  to  a  dif¬ 
fusion  equation  for  the  magnetic  field  ¥ 


-  -v  x  (n  v  x  B)  (6) 

if  the  effect  of  charge  separation  is  ignored,  and  the  medium  is  assumed  to  be 
governed  by  the  simple  Ohm's  law 

F=nT  (7) 

relating  the  electric  field  ¥  and  the  current  density  7.  The  magnetic  diffu- 
sivlty,  n,  can  be  spatially  varying.  In  the  problem  solved  here,  it  has  sharp 
discontinuities. 

Solutions  to  this  equation  are  required  in  two  spatial  dimensions  with 
either  cylindrical  or  planar  symmetry.  For  both  these  symmetries,  the 
equations  governing  the  magnetic  field  in  the  direction  of  symmetry  are  not 
coupled  to  those  which  govern  the  field  transverse  to  that  direction. 

Attention  will  be  restricted  for  the  remainder  of  this  paper  to  the  cylindri¬ 
cal  case  and  to  the  diffusion  of  BQ,  the  component  in  the  direction  of  the 
symmetry.  However,  the  algorithm  described  below  has  been  implemented  for  the 
full  magnetic  field  in  both  symmetry  cases. 

A  well  posed  problem  for  the  field  must  Include  boundary  conditions  for 
B0.  Appropriate  conditions  include  those  which  model  perfectly  conducting  and 
Insulating  walls  as  well  as  the  axis  of  symmetry.  The  field  in  the  conducting 
medium  near  a  perfectly  conducting  wall  is  governed  by  the  condition 
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f  "  •  V  (rB0)  =  0 


(8) 


where  r  is  the  radial  coordinate,  and  n  is  the  normal  to  the  boundary.  Near  a 
perfectly  insulating  boundary  it  obeys 


rBQ  =  kl  (9) 

where  I  is  total  current  flowing  between  the  inner  radial  edge  of  the  insulat¬ 
ing  surface  and  r  =  0,  and  k  is  a  proportionality  constant.  The  condition  on 
the  azimuthal  field  at  the  axis  of  symmetry  is 

B0  =  0  at  r  =  0  (10) 


3.  THE  NUMERICAL  PROBLEM 

It  is  desirable  to  solve  Equation  1  for  timesteps  which  exceed  the  local 
Courant  stability  limit 


dt  > 


(dl)2 


(11) 


at  some  places  in  the  computational  region,  so  an  implicitly  time 
form  must  be  used.  Here  dL  is  a  measure  of  the  local  cell  size, 
efficiency  demands  that  convergence  be  Immediate  if  this  limit  is 
anywhere,  so  the  mixed  time  differenced  form 


differenced 
However, 
not  exceeded 


+  Y  dt  7  X  (n  7  X  B+)  =  B“  -(1-y)  dt  v  x  (n  V  X?--) 


(12) 
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has  been  chosen.  Here  the  superscripts  -  and  +  refer  to  the  magnetic  field 
before  and  after  the  timestep  respectively.  The  time  centering  parameter  y  is 
chosen  to  be  1  in  regions  where  the  Courant  stability  limit  is  exceeded  by  a 
sufficient  margin,  and  0  in  regions  where  it  is  met  by  a  like  margin.  For  the 
regions  where  the  limit  is  marginal,  y  assumes  values  intermediate  to  these. 
The  value  of  the  ratio  of  the  actual  timestep  to  the  local  Courant  stability 
limit 


C 


n 


ndt 

(dL? 


(13) 


will  be  referred  to  as  the  local  Courant  number. 

It  is  common  in  problems  such  as  these  to  include  regions  where  the  local 
Courant  number  is  essentially  infinite.  In  this  way  the  diffusion  equations 
can  be  used  to  model  magnetostatic  fields  in  vacuum,  without  calculating  elec¬ 
tromagnetic  propagation.  In  this  case  there  is  little  interest  in  the  time 
development  of  these  fields  in  those  regions,  and  hence  there  is  no  concern 
over  the  loss  of  accuracy  associated  with  computing  at  the  large  local  Courant 
numbers  inherent  in  those  regions.  The  problem  of  determining  the  solution  to 
Equation  12  in  those  regions  is  clearly  of  the  form 


7  x  (n  V  x  B+)  =  0  (14) 

since  y  =  0  and  C^  =  ®.  Obviously,  the  magnetic  field  that  results  in  these 
regions  is  the  same  as  the  steady  state  field  determined  by  the  boundary  con¬ 
ditions.  Just  as  obviously,  an  elliptic  solver  is  required. 

Following  Brackbill  and  Pracht  [2],  finite  volume  spatial  differencing  is 
used  to  determine  the  current  density  from  the  magnetic  field  by 


7  =  7  x  B  (15) 

and  to  determine  "7  x  (rjj)  in  Equation  12.  This  ensures  that  the  discretized 
field  exactly  satisfies  Ampere's  Law 


i 
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Js  n  •  J  dA  =  /3$  B  •  dl  (16) 

for  any  surface  S  obtained  by  rotating  a  curve  lying  strictly  on  the  grid, 
about  the  central  axis.  Due  to  the  non-recti  linear  grid,  the  difference 
formulae  are  complex.  For  example,  with  the  magnetic  field  represented  as  a 
cell  centered  quantity,  its  curl  is  a  vertex  centered  quantity  given  by 


v  x  B  = 


l  [-r  da 


adjacent 
cells 


X  B 


(17) 


Here  r  is  the  unit  vector  in  the  radial  direction.  The  da  are  the  areas  of 
the  shaded  parallelograms  in  Figure  2;  (dV/de)  is  the  volume  swept  out  by  the 
four  parallelograms  as  they  are  rotated  by  one  radian  about  the  axis  of  symme¬ 
try;  and  the  ”n(dA/de)  are  the  directed  areas  swept  out  by  the  indicated 
diagonals  as  they  are  similarly  rotated.  These  geometric  quantities  are 
sufficient  to  compute  any  vertex  centered  difference  of  any  cell  centered 
quantity.  Most  of  these  geometric  quantities  are  estimated  rather  than  com¬ 
puted  exactly. 


FIG.  2.  Projection  on  computational  plane  of  finite  volume  for  vertex 
centered  differences  of  cell  centered  quantities. 
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4.  THE  ALGORITHM 

To  implement  the  Full  Approximation  Storage  multigrid  scheme  in  a  multiblock 
architecture,  each  coarser  level  is  represented  as  a  separate  block  structure 
with  the  same  connectivity  as  that  of  the  finest  grid.  Thus  each  block  of  the 
finest  structure  is  represented  by  one  and  only  one  block  at  each  level.  The 
blocks  of  each  coarser  structure  are  coupled  by  the  same  block  coupling  as  the 
finest  block  structure.  So  that  corrections  on  one  level  may  move  from  block 
to  block  before  being  interpolated  to  another  level,  the  block  coupling  is 
carried  out  on  all  levels  during  every  computation  of  residuals  or 
corrections. 

Figure  3  shows  a  skeletal  flow  diagram  of  Brandt's  Full  Approximation 
Storage  algorithm  [3],  There  are  six  key  pieces  which  must  be  described  to 
flesh  out  this  algorithm  for  any  particular  application:  the  three  decision 
rules,  the  two  interpolation  rules,  and  the  correction  step.  In  the  multi¬ 
block  version,  because  of  the  complexity  of  the  data  structure  and  difference 
equations,  those  pieces  have  been  made  as  simple  as  possible.  After  brief 
mention  of  aspects  that  are  standard,  the  discussion  will  focus  on  those 
differences  required  by  the  multiblock  architecture. 


FIG.  3.  Brandt's  full  approximation  storage  algorithm. 
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The  decision  rules  are  combined  versions  of  those  required  to  implement 
V-cycling  and  iteration  to  convergence.  Error  and  convergence  rate  are  both 
measured  using  the  norm  of  the  corrections.  Convergence  and  slow  convergence 
rate  are  both  determined  by  comparison  to  user  preset  tolerances.  Little  work 
has  gone  into  considering  the  effect  of,  or  improving  on  these  criteria  since 
the  preferred  mode  of  operation  has  been  V-cycling  with  a  fixed  number  of 
correction  steps  per  level.  Even  so,  in  cases  with  Courant  number  of  order 
10s,  asymptotic  convergence  factors  of  0.8  per  work  unit  were  achieved  on  low 
aspect  ratio  grids. 

The  coarse-to-fine  interpolation  is  done  using  bilinear  interpolation. 

The  cell -centers  of  the  fine  grid  are  located  in  the  coarse  cell -centered  grid 
once,  and  the  indices  of  the  cell  in  which  each  is  found  are  saved.  A 
distorted  grid  with  high  aspect  ratio  cells  can  cause  a  fine  grid  cell  center 
to  lie  outside  the  grid  formed  by  the  coarse  grid  cell-centers.  This  limits 
the  grids  which  can  be  used,  and  would  restrict  grid  movement  required  for 
adaptive  calculations. 

The  fine-to-coarse  interpolation  is  done  by  simple  injection.  This  is 
satisfactory  only  for  regions  where  the  solution  is  slowly  varying.  Results 
in  regions  where  the  solution  varies  more  rapidly  would  be  improved  by  area 
weighted  averaging. 

The  correction  or  smoothing  step  is  done  using  a  simple  block  Jacobi 
iteration.  The  calculation  of  the  approximate  inverse  does  not  even  consider 
the  boundary  conditions,  so  their  effect  is  felt  only  through  the  residual 
calculation.  There  are  three  reasons  for  using  this  rather  inefficient 
smoother.  First,  as  mentioned  above,  the  complexity  of  the  data  structure  and 
difference  equations  mandate  against  additional  complexity  in  the  algorithm. 
Second,  simultaneous  correction  is  easily  synchronizable  within  blocks. 

Third,  It  preserves  symmetric  solutions,  a  property  which  makes  finding  the 
coding  blunders  easier. 

The  only  complication  in  the  smoother  is  due  to  the  fact  that  application 
of  the  full  correction  results  in  divergence  of  the  iteration  for  many 
problems  of  interest.  Apparently,  the  iteration  matrix  thus  constructed  has 
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norm  greater  than  one.  Fortunately,  this  is  easily  overcome  by  the  inclusion 
of  a  relaxation  factor.  Ideally,  that  relaxation  factor  would  be  determined 
by  computation  of  the  spectrum  of  the  iteration  matrix.  Here,  however,  due  to 
the  nonrectilinearity  of  the  numerically  generated  grids  this  is  impossible. 
The  value  of  the  relaxation  factor  is  determined  by  estimating  the  norm  of  the 
iteration  matrix  A  using 


on  selected  test  functions  X.  The  test  functions,  two  functions  representing 
errors  typical  of  inhomogeneous  boundary  value  problems,  are  given  by 


Ennu  (i.j)  = 


Erm  ( i » J )  =  ^,i  ( J ) 


These  functions  are  zero  except  within  a  specific  row  or  column,  indicated  by 
I  or  J,  where  they  are  1.  The  norm  of  the  correction  for  either  of  these 
errors  is  dependent  on  the  transverse  Courant  number  CT  defined  using  Equation 
13  with  dL  given  by  the  thickness  of  the  cell  transverse  to  the  row  or  column 
and  on  the  aspect  ratio  of  the  cell  a  =  dL/dl  where  dl  is  the  cell  dimension 
along  the  row  or  column.  With  the  reciprocal  of  this  estimate, 


CT  (2Ct  +  1) 
2C^  +  (2Cy  +  l)2 


(1  +i?) 


used  cell  by  cell  as  a  local  relaxation  factor,  the  iteration  has  proven  to  be 
convergent  in  a  wide  range  of  problems. 

The  choice  of  Jacobi  iteration  for  the  smoothing  causes  difficulties  when 
the  local  aspect  ratio  of  cells  is  much  different  from  one,  and  the  primary 
diffusion  direction  is  along  the  long  axis  of  the  cells.  The  iteration 
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process  must  then  reduce  errors  which  have  gradients  principally  parallel  to 
that  long  axis.  If  the  Courant  numbers  based  on  the  local  cell  dimensions, 

C|  and  Cj_,  are  very  large,  then 


Cj_  >  C  (  »  1 


(21) 


In  this  case  the  iteration  matrix  reduces  the  principal  errors  by  only  a  very 
small  amount.  This  is  well  known  behavior  for  Jacobi  iteration  on  high  aspect 
ratio  grids  and  the  usual  solution  is  to  use  line  relaxation.  It  may  be 
necessary  to  find  an  effective  way  to  Implement  line  relaxation  in  the  multi¬ 
block  architecture  for  this  reason. 

The  organization  of  the  code  is  a  nontrivial  task  which  requires  some 
elaboration.  The  looping  over  block  index  is  carried  out  at  almost  the  lowest 
level  possible:  just  above  the  the  loops  over  cells.  Hence,  as  (i,j)  indexes 
cells  and  1  indexes  blocks,  the  structure  of  most  loops  is  essentially  as 
shown  here: 


00  100  L=1 ,LMAX 

DO  100  J=1 ,JMAX 
DO  100  1=1 ,IMAX 


100  CONTINUE 


In  applications  to  large  problems  with  a  wide  range  of  block  sizes,  the  amount 
of  storage  allotted  but  not  used  in  the  small  blocks  may  become  prohibitively 
large.  If  the  code  can  adjust  its  memory  length  and  the  dimensions  of  all  the 
arrays  to  suit  the  specific  problem  at  execution  time,  this  problem  can  be 
avoided.  However,  the  simple  form  of  the  loops  above  is  complicated  signifi¬ 
cantly  by  this  change.  The  arrays  in  the  code  then  have  only  the  subscripts 
(i,j)  showing  explicitly.  The  third  index,  1,  is  changed  by  changing  the  base 
pointer  of  the  array,  a  trick  made  simpler  by  the  Cray  Fortran  POINTER  state¬ 
ment: 


POINTER  (KPX  ,  X( IMAX  ,  JMAX)) 


When  a  subroutine  containing  the  above  declaration  is  entered,  the  base 
pointer  of  the  array  X  is  set  to  the  value  of  the  the  integer  KPX,  and  the 
dimensions  are  set  as  indicated.  Both  the  pointer  KPX  and  the  dimensions  must 
be  dummy  arguments  or  common  block  variables.  The  loop  structure  is  then 
shown  in  Figure  4.  The  subroutine  SETBLK  changes  the  values  of  the  pointer 
KPX  and  the  dimensions. 

Block-level  psuedo-code  for  the  coarse-to-fine  interpolation  is  shown  in 
Figure  5,  Each  of  the  indicated  operations  takes  place  in  a  double  loop  over 
(1,j)  inside  a  subroutine  called  from  the  subroutine  MOVFINE  shown.  Each  loop 
over  blocks  shown  begins  with  a  call  to  SETBLK.  The  block  loops  here  cannot 
be  combined  because  the  Interpolation  requires  that  data  flow  across  block 
edges,  and  that  data  must  be  computed  in  the  neighbor  blocks  before  it  can  be 
moved.  This  makes  it  clear  that  code  written  for  the  multiblock  architecture 
looks  very  much  like  code  for  a  multiple  processor.  In  both  cases,  synchroni¬ 
zation  of  the  data  in  the  separate  blocks  is  a  crucial  consideration. 

SUBROUTINE  RESIO 

• 

00  100  L=1 ,LMAX 

CA11  SETBLK(L) 

CA1 1  RSOBLK 
100  CONTINUE 

RETURN 

END 


SUBROUTINE  RSDBLK 
COMMON  KPX 

POINTER  (KPX,  X( IMAX,  UMAX)) 

• 

00  100  1=1,  IMAX 

DO  100  J=l,  JMAX 

.  .  .  difference  equations  for  residual  .  .  . 

100  CONTINUE 

FIG.  4.  Code  structure  required  by  pointered  doubly  subscripted  arrays. 
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MOVFINE 

do  over  blocks 

set  array  base  pointers  and  dimensions 
inject  B^d  from  fine-to-coarse 

compute  C  =  "Bjj  -  on  coarse 
do  over  blocks 

set  array  base  pointers  and  dimensions 
set  ghost  cell  data  for  interpolation  -  either  share  or 
create  it 

do  over  blocks 

set  array  base  pointers  and  dimensions 
interpolate  corrections  to  fine  grid 
apply  them  to  B 

FIG.  5.  Block  level  psuedo-code  for  Interpolation  from  coarse  to  fine  grid. 

5.  EXAMPLE 

Figure  1  shows  a  modestly  complex  Initial  boundary  value  problem  for  B0.  The 
boundary  condition  at  the  top  forces  a  current  to  flow  in  through  the  edge  at 
right  and  out  through  the  edge  at  top  left.  Initially,  B0  is  zero  throughout 
the  region.  The  long-time  behavior  of  the  solution  is  shown  in  Figures  6-8. 
almost  exclusively  in  the  region  of  lower  resistivity.  The  electric  field 
vector  plot  (Figure  8)  shows  that  the  field  is  continuous  at  both  edges  of  the 
region  of  lower  resistivity,  indicating  that  the  central  region  is  also  carry¬ 
ing  current,  though  at  so  low  a  density  that  it  doesn't  show  up  on  the  current 
plot.  The  contour  lines  of  B0  (Figure  7)  show  the  proper  1/r  behavior  and 
that  B0  drops  to  zero  through  the  more  conducting  region.  Further,  since 
IT  =  "n  (?  x  8),  this  indicates  that  B  is  being  computed  with  C1  accuracy. 

6.  COMMENTARY 

Why  would  anyone  bother  with  the  multi  block  approach?  The  answer  lies  in  the 
nature  of  the  code  needed  to  do  finite  difference  calculations  In  complex 
geometry:  it  must  contain  a  lot  of  logical  tests.  This  is  unfortunate  for 
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FIG.  8.  Example  problem:  Electric  Field. 


today's  vector  machines,  where  logic  in  difference  equations  should  be 
avoided.  Furthermore,  complexity  and  logic  in  difference  equations  breed 
coding  blunders.  The  multiblock  approach  makes  simplicity  out  of  complexity 
by  the  divide-and-conquer  strategy.  It  pulls  the  logic  out  of  the  difference 
equations,  making  it  possible  for  the  compiler  to  vectorize  them,  and  allowing 
them  to  be  coded  correctly.  Hence,  it  offers  a  compromise  solution  to  both 
the  vectorization  and  the  complexity  problem. 

Multigrid  algorithms  and  the  multiblock  architecture  seem  to  be  a  good 
match  for  a  number  of  reasons.  First,  other  elliptic  solvers  don't  fit  the 
data  structure  very  well.  The  standard  for  implicit  time  stepping  is  probably 
the  alternating  direction  implicit  method,  but  the  block  edges  stop  the  data 
flow.  The  best  solvers  are  probably  conjugate  gradient  based,  but  the 
equations  aren't  easily  constructed  for  this  data  structure  without  the 
inclusion  of  some  extra  equations  expressing  the  equality  of  the  overlapped 
cells.  Of  course,  high  aspect  ratio  grids  may  force  the  use  of  alternating 
direction  line  relaxation,  in  which  case  some  of  this  advantage  is  lost. 
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Second,  the  higher  level  grids  and  their  data  can  easily  be  stored  as 
additional  blocks,  so  the  same  subroutines  can  be  used  to  compute  the 
residuals  and  corrections  at  all  levels.  Thus,  if  an  explicit  timestepping 
routine  exists,  much  of  its  code  can  be  used  with  very  little  modification  in 
the  residual  computation. 
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Introduction 

The  physical  problem  of  the  formation  and  evolution  of  vortices  in  fluids  is  of  great  scientific 
interest.  Vortices  are  known  to  form  in  corners  and  near  separation  points.  The  structure 
of  these  vortices  then  governs  the  development  of  the  flow. 

Mathematical  asymptotics  and  physical  experiments  indicate  the  existence  of  a  se¬ 
quence,  possibly  infinite,  of  vortices  descending  into  a  corner.  For  Stokes  flow,  by  a  multi¬ 
grid  localization  scheme,  we  have  now  found  more  than  twenty  of  these  vortices.  Our 
resolution  is  obtained  by  appropriate  control  of  the  residual  and  would  appear  to  be  lim¬ 
ited  only  by  the  global  solution’s  accuracy,  initial  grid  size  and  the  precision  of  the  machine 
used. 

A  multigrid  grid  generation  procedure  then  allows  the  construction  of  orthogonal 
boundary-fitted  coordinates  about,  airfoils  and  other  geometries  exhibiting  interesting  sub- 
vortical  detail.  This  procedure  coupled  with  a  multigrid  orthogonal  Poisson  solver  is 
advantageous  for  unsteady  flow  computations. 

We  conclude  with  computational  results  for  flow  about  an  NACA  0015  airfoil. 
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Vortex  Structures  and  Dynamics  of  Flows 


1.  Multigrid  Localization 

The  formation  and  evolution  of  fine  vortex  structure  details  the  large  degree  of  freedom 
inherent  in  solutions  of  the  Navier-Stokes  equations.  Even  simple  linear  Stokes  flow,  gov¬ 
erned  by  the  biharmonic  operator,  exhibits  a  surprising  complexity  of  scales. 

In  order  to  investigate  both  the  computational  and  applications  aspects  of  subvortex 
structure,  we  have  developed  a  robust  multigrid  nested  subdomain  scheme.  This  nested 
subdomain  scheme  was  applied  to  Stokes  flow  in  a  unit  cavity.  Convergence  was  excellent 
and  we  were  able  [l]  to  first  report  ten  corner  subvortices,  by  far  the  best  resolution 
reported  to  that  date.  These  were  obtained  in  a  vectorized  run  on  the  Cyber  205  with  a 
discretization  of  129  by  129  points  and  40  nested  subdomains,  in  under  15  CPU  seconds. 

The  results  in  [l]  were  limited  by  machine  precision  (there  is  a  rapid  O(10~4)  subvortex 
intensity  falloff),  a  glitch  in  the  local  205  software  that  prevented  efficient  vectorized  grid 
lengths  finer  than  2^16-1^/2,  and  a  rapid  buildup  in  the  number  of  iterations  needed  as 
the  subdomain  size  decreased  as  we  nested  down  into  the  corner. 


1.1  New  Fine  Vortex  Resolution 

Here  we  would  like  to  report  the  resolution  of  twenty  five  corner  subvortices  [2].  The 
above  mentioned  precision  and  grid  length  limitations  were  easily  overcome.  However,  the 
rapid  increase  in  the  number  of  iterations  needed  is  due  to  a  rapidly  growing  residual, 
whose  effect  in  going  from,  say,  fifteen  to  to  twenty  five  in  the  corner  vortex  hierarchy,  is 
considerable.  Table  1  illustrates  this  problem.  For  these  results  a  dicretization  parameter 
of  M  =  5  was  used.  The  number  of  grid  points  on  a  side  of  the  computational  domain  is 
given  by  NP  =  2M  +  1.  As  seen  in  [l],  the  grid  size  is  less  important  to  the  full  vortex 
sequence  resolution  than  is  the  residual  control  mentioned  above. 

Note  from  Table  1  that  the  subvortices  occur  in  a  (essentially,  self-similar)  sequence  of 
subdomains  of  shrinking  scale  close  to  2-4.  One  could  take  this  into  account  in  assigning 
the  nested  subdomain  decomposition.  Our  scheme  simply  halved  the  domain  size  and  then 
reapplied  our  residual-controlled  multigrid  scheme.  Thus  we  encountered  a  new  vortex 
subdomain  after  (roughly)  each  four  localizations. 


1.2  Discussion  of  Needed  Further  Research 

Our  present  scheme  does  not  yet  employ  any  back  and  forth  local-global  subdomain  in¬ 
teraction.  This  no  doubt  accounts  for  most  of  our  difficulty  in  maintaining  accuracy  on 
the  subdomains.  Because  the  problem  is  basically  biharmonic,  the  analytic  theory  of  these 
vortex  subdomains  is  also,  generally  speaking,  quite  lacunate  on  this  point.  Further  re¬ 
search  on  appropriate  boundary  values,  optimal  grid  transfer  stencils,  parallel  subdomain 
processing,  and  needed  domain  decomposition  overlap  would  be  important. 

To  be  more  precise,  the  following  algorithm  should  be  investigated.  Thinking  first  in 
terms  of  the  simpler  Poisson  Problem  Lu  =  f  where  L  denotes  the  Laplacian  operator  on  a 
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Table  1:  Local  Maximum  Stream  Function  Intensities 
and  Associated  Residual  and  Iterations  [2]. 


Vortex 


Intensity 


Residual 


Iteration 


0.996  X  10“ 1 
-2.29  X  10"® 

6.55  X  10-1 

-1.87  X  10“  * 
-20 

5.33  X  10 

-  1.52  X  10"- 

4.34  X  10 

-  1.24  x  10" 
3.53  X  10~  o 

-1.01  X  10"4“ 
2.88  X  10~4 
-8.19  X  10““ 

2.34  x  10 
-6.69  X  10 

1.91  x  10 

—  70 

-5.45  x  10 

1.55  X  10  4 

-4.42  X  10 

1.27  X  10 
-3.61  X  10"8 
1.03  X  10-0" 
-2.94  X  lO'l 
8.39  X  10_1 
-2.40  X  10 
6.83  X  10~U; 

-  1.95  X  10"U 


0.88  X 
0.97  X 
0.90  x 
0.94  X 
0.95  x 
0.95  X 
0.98  X 
0.98  X 
0.97  x 
0.94  x 
0.88  x 
0.95  x 
0.85  X 
0.85  X 
0.95  X 
0.15  X 
0.30  x 
0.61  x 
0.12  X 
0.24  x 
0.48  X 
0.19  X 
0.19  X 
0.78  X 
0.15  X 
0.31  X 
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convenient  domain,  e.g.,  first  the  unit  square,  assuming  that  one  has  a  nested  domain  de¬ 
composition  such  as  that  of  Figure  2,  and  assuming  that  one  has  global  and  local  solutions 
ug  and  ti(  respectively,  the  data  flow  of  our  proposed  local  to  global  interaction  algorithm 
is: 


(local)  ui  — ►  ri  —  f  —  Lui 


Ud  =  Iglucf  -*  Uj  =  Uj  +  Ud  (local) 


l 


T 


(global) 


rig  =  I,°rt 
rig  =  o 


LgV  —  fg^T  rig 


on  non 

member  ucj  =  v  —  ug 
gridpoints 


(global) 


For  the  biharmonic  equations  of  the  fluid  corner  subdomains  one  would  have  a  similar 
scheme  although  a  number  of  interesting  new  domain  decomposition  questions  concerning 
information  exchange  already  arise: 

i.  Can  we  seek  corrections  to  u>  and  ip  simultaneously?  Recall  that  there  is  a  strong 
coupling  between  the  ip  and  u  equations  because  the  boundary  conditions  on  the 
vorticity  w  depend  on  interior  stream  function  values  of  ip. 

ii.  Need  we  correct  just  boundary  points  (~  FAC),  or  also  some  or  all  interior  points,  and 
how  may  we  transfer  the  data  to  them  most  efficiently?  In  particular,  in  a  parallel 
processing  environment  such  as  the  hypercube,  how  do  we  optimally  allocate  the 
computations? 

iii.  What  are  the  trade-offs  between  fully  overlapped  processing  (e.g.,  keeping  each  CPU 
fully  occupied)  and  best  parallelism  in  information  passing? 

Improved  knowledge  gained  from  the  investigation  of  this  algorithm  would  be  in¬ 
valuable  in  application  to  other  important  geometries  exhibiting  subvortical  subdomain 
structures. 


2.  Multigrid  Grid  Generation 

Many  other  important  physical  domains  may  be  treated  a  by  multigrid  localization  scheme 
patterned  after  the  method  described  above.  In  particular  we  may  consider  unsteady  flow 
over  an  airfoil.  In  order  to  study  this  and  other  geometries,  a  numerical  grid  genera¬ 
tion  procedure  must  be  employed.  There  are  a  large  variety  of  grid  generation  techniques 
including  partial  differential  equation  methods,  algebraic  methods  and  conformal  trans¬ 
formations.  We  selected  an  elliptic  partial  differential  equation  method  which  allows  the 
construction  of  orthogonal  coordinates  on  infinite  domains.  The  selection  of  an  elliptic 
generation  technique  was  also  motivated  by  the  aim  of  our  research,  namely,  the  solution 
of  the  unsteady  Navier-Stokes  equations  which  govern  unsteady  flows  in  the  laminar  flow 
regime.  The  calculation  of  developing  flow  patterns  governed  by  these  equations  requires 
that  Poisson’s  equation  be  solved  at  each  time  step  of  the  computation.  The  Poisson 
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equation  is  by  definition  the  generation  equation  for  the  coordinates  of  a  grid  system  when 
elliptic  grid  generation  techniques  are  used.  Thus,  efforts  to  make  the  solution  of  the  gen¬ 
eration  equations  more  efficient  also  increases  the  efficiency  of  the  Navier-Stokes  solution 
procedure. 

The  problem  of  grid  generation  around  airfoils  also  requires  the  use  of  a  method 
allowing  calculation  of  boundary-fitted  coordinates.  This  simply  means  that  the  method 
must  allow  a  prescribed  distribution  of  coordinate  nodes  along  the  boundary  of  the  grid, 
so  that  airfoil  domains  of  a  given  shape  can  be  generated.  The  most  popular  of  such 
methods  is  the  numerical  technique  of  Thompson  et  o/.[3]  which  allows  the  construction  of 
a  non-orthogonal,  boundary-fitted  coordinate  system  using  elliptic  generation  equations. 
While  the  non-orthogonality  of  the  method  is  not  a  severe  drawback,  it  does  require  more 
computational  work  than  comparable  orthogonal  systems,  due  to  the  extra  terms  in  the 
transformed  partial  differential  equations  which  requires  nine  point  stencil  instead  of  the 
five  point  stencil  used  on  orthogonal  systems.  Also,  extreme  non-orthogonality  can  effect 
the  truncation  error  of  a  solution.  The  grid  generation  technique  of  Ryskin  and  Leal  [4] 
allows  the  construction  of  orthogonal  boundary-fitted  coordinate  systems,  when  the  weak 
constraint  form  of  their  method  is  employed.  This  method  is  relatively  straight  forward, 
yet  quite  robust.  In  addition,  it  includes  a  procedure  for  construction  of  infinite  coordinate 
systems.  Using  this  procedure  the  difficulties  associated  with  outflow  boundary  conditions 
on  conventional  domains  can  be  avoided.  A  comparison  of  these  two  methods  is  found  in 
[5], 

The  orthogonal  grid  generation  equations,  being  elliptic,  may  of  course  be  solved 
numerically  by  a  variety  of  schemes.  In  [4]  an  ADI  method  was  employed,  whereas  in  [6] 
an  SOR  scheme  was  used.  Here  we  describe  a  Multigrid  scheme  which,  as  mentioned  above, 
also  has  the  advantage  of  providing  efficient  solutions  to  the  flow  equations  themselves.  It 
has  enabled  [7]  remarkable  agreement  with  visualizations  of  physical  flows  about  airfoils 
[8]. 


2.1  Equations  Defining  the  Mapping 


In  order  to  construct  an  orthogonal  grid  system  a  coupled  system  of  two  nonlinear  par¬ 
tial  differential  equations  may  be  employed.  The  solutions  of  these  equations  define  the 
mapping  from  the  x,  y  physical  domain  to  the  rectangular  £,  r?  computational  domain. 
According  to  [4],  the  equations  defining  the  mapping  are: 


d  .  dx.  d  .  1  dx. 

dl(fdf)  +  dnbdj  ~ 
dc  d(’  dvKfdv’ 


(1) 

(2) 


where  the  distortion  function  is  given  by 


f/c  K2  +  y»2)1/2 

nt'V-  hl-  (xf+yfy/* 


(3) 
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with  hi  and  A2  denoting  the  scale  factors  in  the  £  and  r?  directions,  respectively.  The 
distortion  function  specifies  the  ratio  of  the  sides  of  a  small  rectangle  in  the  auxiliary 
domain  which  is  the  image  of  a  small  square  in  the  computational  domain.  The  usual 
subscript  notation  is  employed  for  partial  derivatives.  All  equations  were  discretized  using 
second  order  accurate  difference  formulas  and  the  coefficients  can  be  precalculated  and 
stored  for  computational  efficiency. 


2.2  Solution  Procedure 


The  weak  constraint  method  of  [4]  allows  the  generation  of  orthogonal  grids  with  a  pre¬ 
scribed  boundary  correspondence.  In  order  to  avoid  overspecification  of  the  mathematical 
problem,  the  distortion  function  in  the  interior  of  the  domain  is  not  calculated  directly 
from  its  definition,  but  rather  found  by  interpolation  from  its  boundary  values.  We  remark 
that  direct  specification  of  /  at  interior  points  from  its  definition  above  (  (i.e)  the  strong 
constraint  method  [4])  although  suitable  for  free  boundary  problems  does  not  permit  the 
complete  boundary  correspondence  needed  here.  The  form  of  the  interpolation  formula 
used  is  problem-  and  domain-dependent  and  will  be  discussed  when  the  specific  grid  gen¬ 
eration  problems  are  addressed.  The  interpolation  formulas  are  also  used  to  provide  the 
initial  guesses  for  x  and  y  at  the  beginning  of  the  solution  procedure. 

The  general  solution  procedure  for  calculation  of  an  orthogonal  grid  is  as  follows: 

1.  Set  x  and  y  values  on  the  domain  subject  to  the  boundary  correspondence. 

2.  Interpolate  initial  guesses  for  x  and  y  from  boundary  values. 

3.  Calculate  /  on  the  boundary  using  the  current  x  and  y  solution  and  apply  the  inter¬ 
polation  formula  to  set  /  on  the  interior  of  the  domain. 

4.  Using  an  appropriate  numerical  technique,  calculate  an  approximation  to  the  solution 
of  the  grid  generation  equations. 

5.  Check  convergence  criteria  and  repeat  3  through  5  if  necessary. 

The  convergence  of  the  procedure  is  checked  by  evaluating  the  deviation  from  or¬ 
thogonality  of  the  solution.  As  was  shown  in  [6],  this  can  be  done  using  the  expression: 


cos(0)  = 


XjXr,  +  VjVr, 

((V  +  y«1 2 3 4 5)(V  +  yr>2))* 


(4) 


The  values  of  0  are  calculated  at  all  grid  points  and  then  the  deviation  from  or¬ 
thogonality  \n/2  —  0 1  is  calculated.  The  maximum  value  found  on  the  domain  reflects 
the  orthogonality  of  the  grid  and  is  used  as  the  convergence  criteria.  This  is  called  the 
maximum  deviation  from  orthogonality,  MDO. 


2.3  Model  Cavity  Domian 

In  order  to  develop  a  multigrid  grid  generation  solver,  a  cavity  domain  with  a  concave  side 
was  selected  as  a  model  problem.  Similar  domains  have  been  studied  using  this  orthogonal 
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grid  generation  scheme,  but  with  alternate  methods  used  to  solve  the  mapping  equations 
[4], [6].  These  will  provide  a  basis  for  comparison  with  the  multigrid  method. 

A  model  domain  with  an  interpolated  initial  guess  and  calculated  solution  is  shown 
in  Figure  1.  The  function  used  to  determine  the  shape  of  the  left  hand  side  of  the  domain 
is  x  =  <f(l  —  cos(2ny)),  for  0  <  y  <  1.  By  varying  d  the  concavity  of  the  domain  can  be 
changed. 

Employing  the  weak  constraint  method,  the  x  and  y  values  are  set  on  the  cavity 
boundary.  It  is  then  necessary  to  define  a  suitable  interpolation  for  the  values  of  the 
distortion  function  on  the  interior  of  the  domain.  Following  the  interpolation  used  in 
[4], [6],  the  values  were  calculated  using: 

-  (1  -  £)/M  +  e/M)  +  (1  -  trim 0)  +  nm  1) 

-(1  -  0(1  -  y)/(0,0)  -  (1  -  £)»/( 0, 1)  (5) 

-{(1  -*?)/(!,  0)-£y/(l,l) 

To  employ  multigrid  as  the  solution  method  for  the  mapping  equations  a  suitable 
coarse  grid  operator  must  be  found.  Several  representations  of  the  coefficients  on  the 
coarser  grids  were  tested.  The  most  efficient  method,  which  gives  a  suitable  representation 
of  the  differential  operator  on  the  coarser  grids,  uses  the  next  finer  grid  coefficients  to  calcu¬ 
late  the  corresponding  coarser  grid  values.  The  method  freezes  the  values  and  derivatives 
of  /  on  the  the  finest  grid,  with  the  coarse  grid  coefficients  being  calculated  from  the  next 
finest  grid  coefficients  by  incorporating  a  straight  injection  strategy.  This  strategy  works 
well  since  /  is  smooth  due  to  the  linear  interpolation  used  in  the  weak  constraint  method. 
This  smoothness  of  /  also  explains  why  more  sophisticated  and  expensive  methods  such  as 
full  weighting  did  not  improve  the  representation  of  the  coefficients  on  the  coarser  grids. 
Calculating  coefficients  using  this  method  also  proved  less  expensive  than  if  the  distortion 
function  values  were  used  to  calculate  the  coefficients  on  all  levels. 

On  the  model  problem,  red-black  relaxation  produced  grids  with  good  orthogonality 
for  only  very  small  concavities  ( d  <  0.1).  Zebra  line  relaxation  in  the  £  direction  exhibited 
similar  behavior.  The  failure  of  these  methods  to  handle  large  concavities  is  due  to  the 
anisotropic  nature  of  the  governing  equations  for  large  distortions. 

Zebra  line  relaxation  in  the  y  direction  matched  the  solutions  for  small  concavities 
obtained  using  the  previous  relaxation  methods,  but  also  performed  well  for  the  larger 
concavities.  Alternating  zebra  line  relaxation  showed  similar  behavior,  but  was  more 
expensive  to  perform.  The  added  direction  of  line  relaxation  did  not  significantly  improve 
the  multigrid  convergence  and  so  was  not  as  efficient.  Zebra  line  relaxation  in  the  rj 
direction  was  the  optimum  method,  being  sufficiently  robust  to  handle  the  anisotropic 
nature  of  the  equations  for  this  geometry. 

Let  us  mention  that  the  choice  of  relaxation  method  to  be  employed  is  highly  depen¬ 
dent  on  the  domain  geometry,  interpolation  function,  and  the  boundary  value  specifications 
of  the  weak  constraint  method. 

In  all  cases  it  was  found  that  half  injection  performed  well  as  the  restriction  operator 
for  the  residual,  since  the  solution  appeared  to  be  sufficiently  smoothed  on  the  finer  grids 
for  the  cases  when  orthogonal  grids  were  obtained.  Full  weighted  restriction  was  tried  but 
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Figure  1(a).  Model  Grid  Initial  Guess 


Figure  1(b).  Model  Grid  Solution 
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did  not  improve  the  solver,  only  adding  computational  overhead  and  reducing  the  efficiency 
of  the  method.  Standard  bilinear  interpolation  was  employed  as  the  interpolation  operator. 

V-cycling  was  used  with  t/j  =  2  (relaxations  before  coarse  grid  correction)  and  t/2  =  1 
(relaxations  after  coarse  grid  correction)  performing  the  most  efficiently  with  regard  to 
computational  cost  and  smoothing  efficiency.  Typically,  a  single  V-cycle  per  equation  per 
iteration  was  sufficient  for  convergence.  The  results  for  different  values  of  d  on  a  33  by  33 
point  grid  and  line  relaxation  in  the  q  direction  are  shown  in  Table  2. 

In  [4],  an  ADI  method  was  employed  to  solve  the  mapping  equations.  A  time  step 
equal  to  the  discretization  spacing,  h,  was  recommended  for  the  method.  This  solver  was 
implemented  for  comparison  to  multigrid.  In  comparing  this  method  to  multigrid,  it  should 
be  noted  that  3  sweeps  of  the  ADI  method  are  approximately  equal  to  the  relaxation  work 
of  one  multigrid  V-cycle.  The  results  for  different  values  of  d  on  a  33  by  33  point  grid  are 
shown  in  Table  3.  Note  that  an  increase  in  the  number  of  ADI  sweeps  was  necessary  for 
convergence  of  the  method.  Comparisons  of  the  two  methods  shows  that  multigrid  is  more 
efficient,  and  its  relative  efficiency  increases  with  the  concavity  of  the  domain. 


Table  2:  Effect  of  Concavity  on  Orthogonality 
and  Convergence  of  Multigrid. 

m 

MDO 

#  of  V-Cvcles 

Iterations 

0.05 

0.25 

1 

21 

0.10 

0.85 

1 

24 

0.15 

2.2 

1 

25 

0.20 

6.4 

1 

20 

0.25 

14.7 

1 

18 

Table  3:  Effect  of  Concavity  on  Orthogonality 
and  Convergence  of  ADI  Method. 


d 

MDO 

#  of  Sweeps 

Iterations 

0.05 

0.24 

3 

21 

0.10 

0.85 

3 

23 

0.15 

2.3 

4 

24 

0.20 

6.4 

4 

24 

0.25 

14.8 

5 

28 
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The  effect  of  discretization  on  the  two  methods  was  also  studied.  The  results  are  shown 
in  Tables  4  and  5.  M  is  used  to  denote  the  discretization,  where  the  number  of  points  on 
the  side  of  a  stencil  is  2M  4- 1.  The  results  show  that  the  efficiency  of  multigrid  increases 
with  finer  discretization.  In  fact  for  very  fine  discretizations,  multigrid  constructed  more 
orthogonal  grids  than  ADI.  From  the  behavior  of  ADI  for  increasing  discretization,  the 
dependence  of  At  on  h  is  a  near  optimum  method  for  setting  the  time  step.  The  number  of 
iterations  required  by  multigrid  or  ADI  does  not  increase  dramatically  with  increasing  M. 
This  is  not  the  case  with  SOR  schemes  where  the  number  of  iterations  increased  rapidly 
with  increasing  discretization  [6j. 

In  conclusion,  the  application  of  multigrid  to  the  model  problem  has  illustrated  how 
efficient  and  robust  a  properly  constructed  multigrid  grid  generation  method  may  be.  The 
algorithm  proved  to  be  very  competitive  with  a  near  optimum  ADI  technique.  The  basic 
elements  necessary  to  the  method  have  been  validated  and  can  now  be  applied  to  the 
infinite  airfoil  domain  problem. 


Table  4:  Effect  of  Increasing  Discretization  on 

Orthogonality  and  Convergence  of  Mu 

Itigrid  (d  =  0.1o). 

#  of  V-cycles 

Iterations 

1 

21 

1 

27 

1 

30 

1 

40 

Table  5:  Effect  of  Increasing  Discretization  on 

Orthogonality  and  Convergence  of  ADI  Method  (d**=  0.15). 

id 

MDO 

#  of  Sweeps 

Iterations 

4 

2.77 

4 

23 

5 

2.2 

4 

24 

6 

1.4 

4 

30 

1.6 

4 

30 
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2.4  Application  to  Infinite  Airfoil  Domains 

In  Ryskin  and  Leal  [4],  a  unique  solution  to  the  problem  of  mapping  an  infinite  domain 
onto  a  finite  computational  domain  was  introduced.  By  combining  conformal  mapping 
with  the  orthogonal  grid  generation  algorithm  previously  described,  physical  domains  of 
infinite  extent  were  realized. 

To  apply  the  method,  a  conformal  transformation  F(z)  which  maps  a  finite  auxil¬ 
iary  domain  (*,  y),  suitable  for  grid  generation,  to  the  infinite  physical  domain  (X,  K) 
must  be  found.  The  preferred  conformal  mapping  for  airfoil  domains  is  the  Joukowski 
transformation  defined  by: 


F(z)=X  +  tY  =  ±(z  +  ±) 


(6) 


The  inverse  of  this  mapping,  which  maps  the  exterior  of  the  airfoil  onto  the  interior 
of  a  near  circular  auxiliary  domain  is  given  by: 


G{Z)  =x  +  iy  =  Z-(Z2-l)*  (7) 

To  describe  this  adaptation  of  the  cavity  multigrid  solver  scheme  to  flow  about  an 
airfoil  [2,7,9],  let  us  direct  the  reader  to  Figure  2.  As  can  be  seen,  the  general  procedure 
for  construction  of  infinite  coordinate  systems  proceeds  as  follows: 

1.  Calculate  the  image  of  the  airfoil  in  the  auxiliary  domain  from  the  airfoil  coordinates 
in  the  physical  domain  using  G(Z). 

2.  Employing  the  orthogonal  grid  generation  procedure,  calculate  the  orthogonal  x  and 
y  coordinates  in  the  auxiliary  domain. 

3.  Calculate  the  X  and  Y  coordinates  in  the  physical  domain  using  the  x  and  y  solutions 
obtained  in  2.  The  formulae  relating  the  two  domains,  derived  from  F(z),  are: 


where  r2  =  x2  +  y2. 

4.  Since  the  mapping  from  the  x,  y  auxiliary  domain  to  the  X,  Y  physical  domain  is 
conformal,  the  X  ,Y  grid  will  be  orthogonal  and  the  scale  factors  of  the  physical 
domain  are  calculated  from  hi  and  h2  by  the  formulae: 

Hi  =  IF'OOI*!  H2  =  \F'[z)\h2  (9) 

This  completes  the  construction  of  the  infinite  coordinate  system.  Solutions  calculated 
on  the  computational  grid  now  correspond  to  the  infinite  physical  domain.  Note  that  care 
must  be  exercised  to  avoid  the  singularities  of  the  mapping  at  z  =  ±1,  y  =  0  by  proper 
positioning  of  the  airfoil  in  the  physical  plane.  Also,  it  is  often  necessary  to  calculate  the 
derivatives  Xv,  Yv,  ...,  etc.,  for  use  in  setting  initial  conditions,  boundary  conditions  and 
performing  force  calculations.  These  should  always  be  calculated  using  analytical  formula 
in  terms  of  the  auxiliary  coordinates  x  and  y. 
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In  conclusion,  this  procedure  provides  unique  coordinate  systems,  ones  in  which  in¬ 
finity  boundary  conditions  may  be  applied  at  infinity.  The  standard  ad  hoc  methods  of 
applying  infinity  boundary  conditions  at  truncated  “infinity”  boundaries  are  avoided.  In 
addition,  the  difficulties  posed  by  the  need  to  specify  some  reasonable  outflow  condition 
at  a  downstream  boundary  are  also  removed. 

2.5  Multigrid  Calculation  of  Airfoil  Domains 

Multigrid  performed  well  for  the  calculation  of  the  orthogonal  coordinates  on  the  model 
cavity  domain.  However,  to  apply  this  method  to  airfoil  domains  several  issues  must 
be  addressed:  first,  the  form  of  the  distortion  function  to  be  used;  and  secondly,  the 
appropriate  relaxation  method  to  be  employed  as  the  multigrid  smoother. 

The  £,  T)  computational  domain,  x,  y  auxiliary  domain  and  AT,  Y  physical  domain 
are  shown  in  Figure  2.  As  can  be  seen,  the  r\  axis  is  the  image  of  the  origin  in  the  z,  y 
plane,  which  corresponds  to  infinity  in  the  physical  domain,  so  the  mapping  is  singular 
there.  As  was  pointed  out  in  [4],  the  distortion  function  should  be  set  to  zero  at  such 
locations,  (i.e)  /  =  0  at  £  =  0.  To  satisfy  this  constraint,  the  simple  weak  constraint 
formula,  /(£,rj)  =  £/(1,jj),  can  be  used. 

The  form  of  the  weak  constraint  has  a  significant  effect  on  the  nature  of  the  grid 
generation  equations.  The  behavior  of  the  distortion  function,  (i.e.)  /  — >  0  as  £  — ♦  0, 
makes  the  coefficients  in  the  rj  direction  very  large  compared  to  the  coefficients  in  the 
£  direction.  This  requires  the  use  of  line  relaxation  the  r/  direction  as  the  smoother  in 
the  multigrid  process.  This  effectively  takes  into  account  the  anisotropic  nature  of  the 
governing  equations  for  this  form  of  the  distortion  function. 

In  order  to  perform  the  line  relaxations  in  the  rj  direction,  the  cavity  tridiagonal  solver 
used  must  be  modified  to  account  for  the  periodic  nature  of  the  solution  sought.  The  proce¬ 
dure  employed  is  known  as  a  rank  one  modification  technique  and  requires  two  tridiagonal 
solutions  for  each  line  relaxation  performed.  This  effectively  doubles  the  computational 
work  necessary  to  perform  line  relaxation  on  a  periodic  domain.  Appropriate  relaxations 
may  now  be  performed  in  the  »j  direction.  The  procedure  also  has  applications  to  other 
solution  methods,  such  as  ADI,  on  periodic  domains. 

In  order  to  calculate  an  orthogonal  grid,  a  suitable  image  of  the  airfoil  must  be  ob¬ 
tained  in  the  auxiliary  plane.  For  the  NACA  00  series  airfoils,  the  following  method  worked 
well.  As  an  example,  consider  the  application  of  the  method  to  the  NACA  0015  airfoil, 
which  will  be  needed  for  the  flow  computation.  First  the  coordinates  of  a  unit  length 
NACA  0015  airfoil  were  calculated  from  the  equation  for  NACA  00  series  airfoils: 

±y  =  ^(0.2969z*  -  0.1260x  -  0.3516x2  +  0.2843x3  +  0.1015z4)  (10) 

where  t  =  0.15.  The  leading  edge  of  the  airfoil  is  located  at  the  origin.  To  position  the 
airfoil  in  the  physical  domain,  the  following  formulae  were  used: 

Y  =  (2  +  ci)y  X  =  (2  +  Ci)x  —  (1  +  c%ci)  (11) 
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The  constants  ei  and  c 3  were  found  by  trial  and  error,  with  Ci  =  0.05  and  c3  =  0.95 
producing  a  suitable  shape  in  the  auxiliary  domain.  Now  computation  of  the  auxiliary 
coordinates  can  be  performed  using  multigrid  or  some  otheT  method  of  choice. 

The  multigrid  restriction  and  interpolation  employed  on  the  model  cavity  domain  were 
implemented  for  calculation  of  the  auxiliary  coordinates.  The  only  modification  necessary 
for  this  domain  was  the  addition  of  restriction  and  interpolation  of  function  values  at  the 
boundaries  2  and  >7=0,  since  the  points  along  these  lines  are  now  considered  to  be 
interior  points. 

Line  relaxation  in  the  q  direction  as  well  as  an  ADI  method  were  also  implemented. 
These  methods  both  produced  orthogonal  grids  with  a  converged  MDO  value  of  0.9  degrees. 
A  coarser  representation  of  the  auxiliary  grid  calculated,  and  the  corresponding  airfoil  grid 
are  shown  in  Figure  3. 

To  assess  the  performance  of  the  two  methods  on  the  given  computational  domain,  a 
study  was  performed  to  determine  the  effect  of  sweeps  and  V-cycling  on  the  two  methods. 
Since  the  auxiliary  domain  was  optimized  for  grid  generation,  the  initial  guess  was  quite 
good  giving  an  initial  MDO  of  6  degrees  (compared  to  the  large  initial  MDO  values  for 
the  model  cavity  domain,  17  to  56  degrees).  Thus,  it  was  found  that  extra  sweeps  or 
smoothing  were  not  efficient  and  only  added  computational  overhead  to  the  calculations. 
This  is  shown  in  Table  6. 


Table  6:  Effect  of  V-Cycling  on  Multigrid 

and  Number  of  Sweeps  on  ADI  Method 

Vlultigrid 

ADI 

V-cycles 

vl 

V2 

iterations 

|  sweeps 

iterations 

1 

1 

0 

25 

1  1 

47 

1 

1 

1 

25 

2 

35 

1 

2 

1 

24 

1  3 

27 

It  appears  that  multigrid  is  the  more  efficient  of  the  two  methods  for  the  airfoil 
problem,  but  this  is  really  not  the  case.  When  only  one  relaxation  is  performed  on  each 
grid  level,  the  overhead  for  calculations  of  residuals  and  coefficients  becomes  much  larger 
when  compared  to  actual  relaxation  work  done.  In  terms  of  computational  time,  ADI  with 
one  sweep  is  the  quickest  solver  followed  closely  by  multigrid  with  j/|=l,  1/3=0  relaxations. 

The  real  advantage  of  multigrid  in  this  setting  is  for  the  solution  of  Poisson’s  equation 
which  is  required  at  each  time  step  of  the  unsteady  flow  calculation.  This  makes  multigrid 
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the  fax  superior  method  for  application  to  unsteady  hows. 

We  used  simply  a  CS  scheme  rather  than  a  FAS  scheme  due  to  the  fact  that  the 
grid  generation  was  a  much  smaller  computational  burden  than  was  the  Poisson  Solver 
in  the  unsteady  flow  computations.  More  research  is  needed  on  orthogonal  multigrid 
grid  generation  schemes  employing  FAS  and  updates  of  the  distortion  function  during 
cycling.  This  approach  is  feasible  and  could  reduce  the  number  of  iterations  required  for 
convergence. 


2.6  Computation  of  Flows  About  Airfoils 

Figures  4  &  5  illustrate  results  for  full  unsteady  Navier-Stokes  flow  about  an  NACA 
0015  airfoil.  As  described  above,  the  physical  domain  is  mapped  to  an  auxiliary  domain 
on  which  the  physical  flow  is  computed.  Briefly  describing  the  solution  procedure,  the 
multigrid  solver  is  used  for  the  stream  function  computations,  an  ADI  scheme  advances 
the  vorticity  values  in  time,  with  the  multigrid  grid  generation  technique  providing  the 
boundary-fitted  coordinates.  The  results  given  here  improve  our  preliminary  results  given 
in  [7]  by  providing  exact  time  correlation  with  the  physical  simulations  described  in  [10]. 
Further  results  and  details  will  be  reported  in  [9,11]. 

The  flow  illustrated  in  Figure  4  is  an  accelerating  flow  from  rest  past  a  NACA  0015 
airfoil.  The  Reynolds  number  based  on  the  acceleration  of  the  flow  is  Racc  =  835.  The  flow 
visualization  photographs  on  the  right  were  obtained  using  the  vorticity  tagging  method 
described  in  [8].  The  flow  is  visualized  by  placing  on  the  upper  surface  of  the  airfoil  a  liquid 
which  reacts  with  air  to  form  a  dense  smoke.  The  resulting  streaklines  detail  the  developing 
flow.  The  smooth  metal  surface  of  the  airfoil  reflects  a  mirror  image  of  this  detail  which 
should  not  be  confused  with  the  real  flow.  In  Figure  5,  an  impulsively  accelerated  flow 
from  rest  is  shown.  The  Reynolds  number  for  this  flow,  based  on  the  free  stream  velocity, 
is  R  =  1000.  In  both  of  these  comparisons,  note  the  excellent  resolution  of  fine  vortical 
detail  and  temporal  correlation  between  the  experimental  and  numerical  studies. 


3.  Conclusions 

A  next  step,  an  important  one,  is  to  study  the  use  of  multigrid  nested  localizations  for 
these  airfoil  applications.  The  flow  results  in  Figures  4  &  5  did  not  involve  the  localization 
scheme  described  in  the  first  section  above,  so  useful  for  the  cavity  applications,  because 
we  do  not  yet  know  what  subgrid  information  is  best  carried  to  the  next  time  step,  nor 
the  appropriate  method  for  generation  of  these  subgrids. 

Other  interesting  geometries  for  study  include  the  Taneda  Wedge  [11].  This  would 
be  especially  interesting  for  acquiring  some  basic  understanding  of  parallel  mappings  and 
parallel  computations  of  steady  flows  in  physical  subdomains. 

In  particular,  a  much  needed  investigation  is  that  of  parallel  control  of  residual 
buildup.  One  does  not  want  a  total  flow  computation  polluted  by  a  growing  error  in 
one  subdomain.  As  discussed  above,  our  algorithm’s  eventual  resolution  is  limited  not 
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by  machine  precision  or  grid  precision  but  by  fine  structure  residuals.  As  such  it  would 
provide  an  excellent  vehicle  for  parallel  computational  studies. 

The  combination  of  multigrid  localization  algorithms  and  multigrid  grid  generation 
and  mapping  as  reported  here  would  appear  to  be  of  interest  for  a  wide  variety  of  problems, 
both  numerical  and  physical. 


Acknowledgments 

The  work  of  R.  Leben  was  supported  by  the  U.S.  Air  Force  Office  of  Scientific  Research 
under  Grant  81-0037  and  by  NASA  Headquarters  Grant  NAGW-915.  We  would  also  like 
to  acknowledge  computer  time  obtained  from  the  John  von  Neumann  Computing  Center 
and  the  Colorado  Center  for  Astrodynamics  Research. 

References 


[1].  K.  Gustafson  and  R.  Leben,  “Multigrid  calculation  of  subvortices,”  Applied  Math,  and 
Computation,  19  (1986),  89-102. 

[2j.  K.  Gustafson,  R.  Leben,  to  appear. 

[3] .  J.F.  Thompson,  Z.U.A.  Warsi,  and  C.W.  Mastin,  “Numerical  Grid  Generation,”  El¬ 

sevier  Science  Publishing  (1985). 

[4] ,  G.  Ryskin  and  L.G.  Leal,  “Orthogonal  mapping,  ”  J.  Computational  Physics ,  50 

(1983),  71-100. 

[5] .  Z.U.A.  Warsi,  “Numerical  Generation  of  Orthogonal  and  Non-Orthogonal  Coordinates 

in  Two  Dimensional  Singly  and  Doubly  Connected  Regions,”  von  Karman  Institute 
Technical  Note  151  ,  (1984). 

[6] .  E.D.  Chikhliwala  and  Y.C.  Yortsos,  “Application  of  orthogonal  mapping  to  some 

two-dimensional  domains,”  J.  Computational  Physics,  57  (1985),  391-402. 

[7] .  K.  Gustafson,  R.  Leben,  “Vortex  Subdomains,”  Proc.  1st  International  Con},  on  Do¬ 

main  Decomposition  of  Partial  Differential  Equations, Paris  7-9  January  1987,  SIAM 
(1987).  See  also  K  Gustafson,  Partial  Differential  Equations, 2nd  Edition,  Wiley,  New 
York  (1987). 

[8] .  P.  Freymuth,  “The  vortex  patterns  of  dynamic  separation:  a  parametric  and  compar¬ 

ative  study,”  Prog.  Aerospace  Sciences,  22  (1985),  161-208. 


[9],  C.Y.  Chow,  K.  Gustafson,  R.  Leben,  to  appear. 


250 


Vortex  Structures  and  Dynamics  of  Flows 


[10] .  F.  Finash,  “Experimental  Study  of  Two-Dimensional  Vortex  Patterns  for  Impul¬ 

sively  Started  Bodies  in  Comparison  with  other  Configurations,”  Ph.D  Thesis,  Dept. 
Aerospace  Engineering  Sciences,  University  of  Colorado  (1987). 

[11] .  S.  Taneda,  “Visualization  of  separating  Stokes  flows,  ”  J.  Phys.  Soc.  Japan  ,  46 

(1979),  1935-1942. 


Applications  of  the  Fast  Adaptive 
Composite  Grid  Method 


By  M.  Herouxf,  S.  McCormick^,  S.  McKayf,  J.W.  Thomasf 
■[Department  of  Mathematics 
Colorado  State  University 
Ft.  Collins,  Co. 

JDepartment  of  Mathematics 
University  of  Colorado-Denver 
Denver,  Co. 


In  this  paper  we  present  a  mesh  refinement  scheme,  FAC,  which  is  designed  to  solve 
nonlinear  partial  differential  equations  when  more  resolution  is  needed  in  one  area  of  the 
domain  than  in  others.  The  scheme  is  based  on  the  full  approximation  scheme  of  multigrid 
and  depends  strongly  on  the  choice  of  the  difference  operator  on  the  grid  interface  of  the 
composite  grid.  FAC  is  self-adaptive  based  on  Richardson  extrapolation.  Results  applying 
FAC  to  several  problems  are  presented. 

1.  Introduction 

There  is  a  critical  need  for  mesh  refinement  schemes  in  numerical  partial  differential  equa¬ 
tions.  Many  problems  exhibit  solution  behavior  that  requires  more  resolution  in  one  area 
of  the  domain  than  in  others.  For  problems  that  are  large  enough  to  prohibit  use  of  a 
global  fine  grid,  some  sort  of  local  mesh  refinement  scheme  is  essential. 

There  are  several  criteria  that  one  must  consider  when  developing  a  mesh  refinement 
scheme.  First,  its  cost  should  not  increase  dramatically  with  the  insertion  of  a  local  fine 
grid.  Second,  this  local  grid  should  not  require  that  the  global  problem  be  completely 
reconsidered.  Third,  this  introduction  of  the  local  grids  should  not  require  significant 
recoding  costs.  Fourth,  the  scheme  should  allow  for  effective  self-adaptive  strategies;  ap¬ 
plications  do  not  always  easily  allow  a  priori  determination  of  the  regions  that  require 
local  refinement.  Fifth,  it  should  run  efficiently  on  vector  and  parallel  machines.  Finally, 
introduction  of  local  fine  grids  should  not  introduce  any  false  physics  into  the  results. 

The  feist  adaptive  composite  grid  method(FAC;  cf.  [l],  [2])  is  a  mesh  refinement  scheme 
that  was  developed  with  the  above  attributes  in  mind.  FAC  was  principally  developed  for 
elliptic  equations  with  emphasis  on  resolving  the  areas  near  wells  in  oil  resevoir  simulations, 
but  appears  equally  applicable  to  a  variety  of  other  problems  as  well.  As  will  be  seen 
in  the  next  section,  FAC  handles  regions  that  require  local  resolution  with  independent 
rectangular  patches  or  overlapping  groups  of  such  patches.  This  patch  structure  is  what 
gives  the  FAC  algorithm  many  of  its  attributes.  Communication  between  these  patches 
and  their  host  grids  is  done  using  multigrid-like  techniques  although  the  grid  solvers  are 
not  restricted  to  multigrid.  They  can  be  any  of  a  variety  of  iterative  or  direct  methods. 

This  work  was  supported  by  the  Air  Force  Office  of  Scientific  Research 
under  grant  number  AFOSR-86-0126  and  the  Department  of  Energy  under 
grant  number  DE-AC03-84er.  ©1987,  Colorado  Research  Development  Corporation 
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There  are  many  different  types  of  mesh  refinement  schemes  in  the  literature.  FAC  is 
similar  to  the  schemes  of  Bai  and  Brandt  [3],  Berger  [5]  and  Caruso,  Ferziger  and  Oliger  [4], 
The  scheme  due  to  Bai  and  Brandt  is  fully  integrated  with  multigrid  and  uses  FAS  [3]  to 
provide  intergrid  communication.  This  approach  differs  basically  in  that  they  do  not  focus 
on  a  composite  grid  like  FAC  does.  Berger’s  scheme  is  specifically  designed  for  explicit 
time  dependent  problems,  though  some  of  the  technical  grid  processing  ideas  used  in  the 
FAC  algorithm  were  motivated  by  her  work.  But  there  is  a  much  more  basic  difference 
between  the  schemes  given  in  [3]-[5]  and  FAC,  primarily  in  the  way  the  boundaries  of  the 
refined  regions  are  treated.  As  we  will  see  in  Sections  2  and  3,  this  treatment  of  the  internal 
boundaries  determines  what  discrete  problem  is  actually  being  solved  and  facilitates  control 
of  the  effects  of  the  fine  grid  on  the  physics  of  the  problem.  This  treatment  of  internal 
boundaries  also  allows  for  a  simple  theory  of  convergence  and  complexity. 

Another  scheme  developed  recently  by  Bramble,  Ewing,  Pasciak  and  Schatz  [7],  is 
essentially  the  same  approach  as  FAC,  but  is  placed  in  a  finite  element  setting  and  used 
as  a  preconditioner,  not  as  a  solver. 

In  the  next  section  we  will  described  the  FAC  algorithm.  Section  3  is  devoted  to  a 
discussion  of  how  we  choose  the  composite  grid  operator.  The  self-adaptive  scheme  that 
we  have  implemented  is  described  in  Section  4  .  Finally,  results  from  applying  FAC  to 
several  problems  are  included  in  Section  5. 

2.  The  FAC  Algorithm 

The  FAC  algorithm  is  designed  to  solve  partial  differential  equations  where  there  are  one  or 
more  local  regions  that  require  a  high  degree  of  resolution  and  accuracy.  Several  different 
schemes  and  general  purpose  software  for  these  schemes  have  been  or  are  currently  being 
developed.  The  first  of  these  schemes  is  the  linear  FAC  algorithm  which  is  based  on 
residual  correction  and  is  described  in  [2].  In  [2]  it  is  shown  that,  under  reasonably  mild 
hypotheses  (i.e.,  that  the  composite  grid  operator  is  essentially  symmetric  and  positive 
definite),  the  FAC  algorithm  is  convergent  with  rates  that  depend  on  certain  regularity 
and  approximation  properties.  These  results  include  the  case  when  the  grid  equations  are 
only  solved  approximately.  The  numerical  results  reported  in  [2],  [8]  and  [9]  show  that 
the  rate  of  convergence  of  FAC  is  very  good  and  that  it  is  applicable  to  a  wide  variety  of 
problems  not  covered  by  the  theory  in  [2], 

In  this  paper  we  shall  describe  a  nonlinear  FAC  algorithm  that  is  based  on  the  full 
approximation  scheme  of  multigrid.  Here,  for  simplicity,  we  formulate  FAC  using  one 
refinement  region  and  one  local  grid.  Though  theory  is  not  yet  available  for  this  FAC 
scheme,  observed  convergence  rates  are  very  similar  to  those  for  the  corresponding  linear 
problems. 

Suppose  the  partial  differential  equation  we  wish  to  solve  is  given  by  H(v)  =  g , 
including  boundary  conditions  on  the  domain  0.  Suppose  for  convenience  that  fl  is  a 
rectangle  in  R 7  containing  a  proper  rectangular  subregion,  flF.  Suppose  that  flF  requires 
a  finer  resolution(grid  spacing)  than  the  rest  of  fl.  Then,  given  a  coarse  grid  G  with  grid 
spacing  Ax  =  Ay,  on  f 1F  (which  is  assumed  to  be  aligned  with  the  coarse  grid)  we  place  a 
finer  grid  QF  with  grid  spacing  6x  =  6y.  We  assume  that  Ax  =  Ay  =  m6x  =  m6y,  where 
the  mesh  ratio  m  is  a  positive  integer.  The  composite  grid  $  is  defined  to  be  the  union  of 
G  and  §F  as  illustrated  by  Figure  1. 
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Figure  1.  Example  of  a  composite  grid. 

Consider  an  approximation  to  the  given  partial  differential  equation  and  boundary 
conditions  on  the  composite  grid  $.  Assume  that  this  approximation  can  be  written  as 

£(u)  =  /,  (1) 

where  £  is  the  nonlinear  operator  resulting  from  a  discrete  approximation  of  the  partial 
differential  operator  H  and  /  represents  the  inhomogeneous  terms  in  the  equation  and 
boundary  conditions. 

The  composite  grid  can  be  partitioned  so  that  5  —  9c  U  U  QF ,  where  Qi  consists 
of  the  coarse  grid  points  along  the  boundary  of  flf ,  Qf  the  fine  grid  points  inside  flf  and 
§c  the  coarse  grid  points  outside  of  fir-  The  partitioning  is  illustrated  in  Figure  2. 

The  coarse  grid,  G,  can  be  partitioned  similarly  as  G  =  G<?  U  G/  U  Gf ,  where 
Gc  consists  of  the  coarse  grid  points  outside  the  boundary  of  fiF  (Go  =  Qc ),  G/  the 
coarse  grid  points  on  the  boundary  of  flf  (Gj  =5/)'  and  Gf  the  coarse  grid  points  inside 
f If  (G/-  C  9r)- 
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Figure  2.  $  ~  □  U  ■  U  0  U  O,  9c  =  □ ,  9i  =  ■»  9r  =  Z  U  O,  G  =  □  U  ■  U  0,  Gc  - 

G/  =  ■,  and  Gf  =  0. 
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Q,  Sf  and  G  are  the  principal  grids  used  in  the  FAC  algorithm.  To  be  able  to  pass 
information  between  them,  assume  that  a  prolongation  or  interpolation  operator,  J,  and  a 
restriction  operator,  JT ,  have  been  defined  so  that 

I  ■  G  — >  5  (2) 

and 

IT  -5  — G  (3) 

Using  I  and  V ,  an  operator,  L,  on  the  coarse  grid  can  be  defined  according  to  the  Galerkin 
condition 

L  =  Ito£oI.  (4) 

Based  on  the  above  paritioning  of  the  grids  p  and  G,  we  can  partition  u,  u,  t  and  L 
as 


=  {uc,Uf,UF)T 

,U  =  (Uc,U„Uf)r, 

(5,6) 

f  tea 

ta  ©  A 

£={  tic 

til  tlF  1 

(7) 

V  © 

tfl  tFF  J 

(-Lee 

La  i  ©  A 

U  =  j  L/c 

U//  L iF  J 

(8) 

V  0 

LFr  L FF  ) 

Note  here  that  £cc  =  Lcc>tic  =  L ic  and  tci  =  LC/.  CFF  is  similar  to  bFF  except 
that  there  are  more  grid  points,  thus  more  entries  in  the  tFF  block.  The  significant 
difference  between  L  and  t  is  in  blocks  tIF  and  CF!,  where  C  is  “reaching”  for  grid 
points  in  Q,  and  in  QF ,  respectively.  How  this  is  done  is  what  ultimately  defines  the 
composite  grid  operator  £. 

With  this  machinery  in  place,  an  FAC  cycle  that  allows  for  nonlinear  operators  can 
be  written  as  follows. 

Step  1.  u  <—  L- 1  {IT  (f  -  Cu)  +  L(/ru)}  "j 

Step  2.  u  <—  u  +  7{u  -  ITu}  >  (9) 

Step  3.  uF  <—  CFF{fF  —  CFi(u.[))  j 


Here,  each  step  is  an  assignment  statement  represented  by  a  left  arrow. 

Beginning  with  a  zero  initial  guess  for  u(or  any  other  guess  if  one  is  available),  in 
Step  1  the  coarse  grid  is  solved  as  if  there  were  no  fine  patch.  After  the  first  cycle,  the 
right-hand  side  of  Step  1  is  augmented  with  the  residual  to  make  a  correction  to  the  most 
recent  approximation  to  the  composite  grid  solution. 

If  the  composite  grid  problem  (1)  is  expanded  using  the  form  of  £,  u  and  /,  it  is  easy 
to  see  that  Step  1  is  an  approximate  solver  for  the  first  two  equations  and  that  Step  3 
solves  the  third  equation  using  the  previous  approximations  for  uc  and  u; . 
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3.The  Composite  Grid  Operator 

The  composite  grid  operator  £  was  introduced  as  the  desired  approximation  to  the  partial 
differential  equation  on  the  composite  grid.  This,  however,  understates  the  importance 
of  the  choice  of  £.  The  form  of  £  is  usually  very  easy  to  determine  in  regions  that  are 
not  near  the  internal  boundaries  of  the  patches  because,  in  these  regions,  we  use  the  same 
difference  operator  that  would  be  used  if  there  were  no  patches.  Of  course,  if  desired,  it 
is  permissible  to  use  different  operators  in  each  region,  say  something  approximating  an 
inviscid  flow  on  the  coarse  grid  away  from  boundaries  and  something  approximating  a  thin 
layer  Navier-Stokes  equation  in  the  boundary  layer.  However,  the  form  of  £  on  the  internal 
boundary  should  be  shown  much  more  care.  Since  the  patches  are  usually  in  regions  of  the 
domain  that  are  driving  the  entire  problem,  a  careless  definition  of  £  on  the  boundary  of 
the  patches  can  very  easily  introduce  false  physics  into  the  problem.  We  have  developed 
several  important  ways  to  produce  £,  each  requiring  that  care  be  taken  in  treating  the 
fluxes  across  internal  boundaries. 

The  first  approach  we  took  to  define  £  at  the  grid  interface  was  simply  to  form  a  Taylor 
series  expansion,  choosing  appropriate  terms  that  yield  a  consistent  difference  operator. 
The  problem  with  this  approach  is  that  this  focuses  primarily  on  truncation  error,  not 
actual  accuracy.  The  results  can  therefore  be  misleading.  Nevertheless,  this  provides  an 
important  tool  for  assessing  consistency  of  £  at  these  irregular  interfaces. 

In  principle,  it  is  easy  to  form  £  for  variational  problems.  In  this  case,  it  is  accept¬ 
able  to  interpolate  the  coarse  grid  values  to  the  internal  boundary  and  one  row  or  column 
of  ghost  points,  and  then  evaluate  the  residual  at  the  interface  points  by  calculating  the 
appropriate  fine  grid  operator.  This  approach  is  especially  useful  with  finite  element  for¬ 
mulations.  For  example,  for  two-dimensional  elliptic  problems,  this  is  effective  for  certain 
9-point  stencils  and  full  weighting,  but  it  is  inappropriate  for  5-point  discretizations. 

The  last  approach,  which  seems  most  effective  and  general,  is  based  on  finite  volumes 
and  works  more  directly  with  the  original  physical  system.  It  amounts  to  balancing  care¬ 
fully  chosen  mass  flow  fluxes  across  the  boundary.  In  concert  with  this,  we  attempt  not 
only  to  approximate  the  continuous  conservation  law,  but  also  to  have  the  discrete  system 
£  and  the  iteration(Steps  1-3)  be  conservative. 

For  example,  consider  approximating  a  Laplacian  operator  H( u)  =  VJu  at  point  1 
on  the  grid  interface  pictured  in  Figure  3.  The  equation  at  the  point  will  be  determined 
by  balancing  fluxes  in  the  control  volume  drawn  (with  broken  lines)  about  the  point.  The 
flux  across  the  top  is  the  same  as  usual(- Ax)  and  the  fluxes  across  the  right  and 
left  sides  are  as  usual  except  that  they  must  reflect  the  fact  that  the  length  of  the  vertical 
sides  is  only  2-Ay  (i.e.(!ia^LL)| Ay  and  Ay). 

The  bottom  side  of  the  box  controls  the  transition  between  the  two  grids.  One  rea¬ 
sonably  logical  and  commonly  used  approach  is  to  approximate  the  derivative  along  the 
bottom  by  and  use  u*~Ul  Ax  els  the  flux  across  the  bottom  boundary.  This  discrete 

flux  is  a  good  approximation  to  the  continuous  flux  but  can  cause  problems  numerically  as 
will  be  seen  by  the  transonic  flow  problem  in  Section  4.  The  problem  with  using  the  above 
flux  across  the  bottom  boundary  of  the  control  volume  is  that  it  does  not  conserve  mass 
discretely.  The  fine  grid  operator  at  points  5  and  7  reach  to  points  9  and  8,  respectively, 
to  compute  the  fluxes  across  the  tops  of  their  control  volumes.  Points  9  and  8  are  not  true 
grid  points  but  are  interpolated  points.  Thus  the  equations  at  points  5  and  7  depend  on 
point  1,  but  in  the  above  formulation,  the  equation  at  point  1  does  not  depend  on  points 
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Figure  3. 


5  and  7.  More  precisely,  the  flux  used  for  point  1  at  this  bottom  boundary  is  not  what  is 
used  for  points  5  and  7;  the  fluxes  are  not  in  balance.  We  refer  to  this  approach  as  the 
5-point  scheme. 

One  approach  we  have  used  to  correct  this  problem  is  to  divide  the  bottom  boundary 
into  three  parts  (of  lengths  6x/2,  6x  and  6x/ 2)  and  compute  the  fluxes  across  the  two  outer 
regions  using  the  interpolated  values  at  points  9  and  8.  Thus  the  flux  across  the  bottom 
boundary  is  given  by 


u5  —  u9  6x 
6y  T 


+ 


6x  + 


u7  —  u8  6x 

6y  T' 


If  a  similar  expression  is  used  to  calculate  the  flux  across  the  bottom  boundary  of  the 
control  volume  associated  with  point  2,  the  contribution  associated  with  point  7  wil  balance 
exactly  with  the  flux  across  the  top  of  the  control  volume  of  point  7.  Extending  this  analysis 
to  all  points  on  the  grid  interface  shows  that  t  to  conserves  mass  discretely.  We  call  this 
approach  the  7-point  scheme. 


4.  Self-Adaptivity 

The  FAC  algorithm  will  accept  a  variety  of  self-adaptive  schemes  as  a  part  of  the  mesh 
refinement  algorithm.  The  reason  for  this  is  that  FAC  can  be  organized  so  that  the  decision 
to  further  refine  a  particular  grid  can  be  made  after  that  grid  has  been  processed. 

The  first  two  self-adaptive  schemes  we  tested  were  based  on  assessing  certain  properties 
of  the  emerging  solution.  We  primarily  used  various  Sobolev  norms  of  the  approximation 
to  determine  where  patches  were  needed.  These  methods  worked  sufficiently  well  for 
certain  problems,  but  did  not  seem  to  be  sufficiently  robust  to  serve  as  a  general  purpose 
self-adaptive  scheme. 

The  present  scheme  we  are  currently  developing  is  based  on  a  Richardson  extrapolation 
of  the  error.  This  approach  follows  that  used  by  Berger  in  [5]  and  is  briefly  described  as 
follows. 

Let  (*>  j) >  (/,  J)  and  (*,  y)  all  denote  the  same  physical  point  in  a  given  grid,  a  coarser 
grid  and  the  domain,  respectively.  Let  u,,  ,  U, }  and  u(x,y)  denote  the  solutions  to  their 
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respective  problems.  Then,  assuming  that  we  have  a  k-th  order  numerical  scheme,  we  can 
represent  the  error  at  the  given  point  for  each  of  the  two  grids  by 


tn  =  u(x,y)  -  u0 

=  6xkF^l)  +  6xk+1  F.p  +••• 


(10) 


and 

t,j  =  u(x,y)  -Uu  =AxkF'l))  +  Axk+1FlI])  +  •••,  (11) 

where  F^ f’  and  Fjj*  are  generally  undetermined  coefficients.  Subtracting  (10)  from  (11) 
gives 


cu  ~  ‘a  =  -UI}  =  (m 


*  p(U 


Oh 


)6xk  +  -  F^')6xk+l  + 


oh 


(12) 


Assuming  that  Fj1/  =  F^1 1  (they  are  Taylor  coefficients  evaluated  at  the  same  points), 
then  division  by  mk  —  1  yields 


u»y  —  Uu 
mk  —  1 


=  6xkF 


O) 


6xk -1 


(mk+1Flj>  -  Ffy1) 


Oil 


m‘ 


k  _ 


+ 


Comparing  equations  (10)  and  (13),  we  note  that 


Thus,  we  use 


e.y 


m'1  —  1 


+  0(6x*+l). 


Ui>  —  Uu 
mk  -  1 


(13) 


(14) 


as  our  ( k  +  l)st  order  approximation  to  the  error.  Of  course,  if  some  of  the  Ft1*’ s  and 

F{uUs  are  zero,  the  order  of  approximation  of  the  error  may  be  greater  than  expected. 

The  difficulty  of  applying  equation  (14)  as  our  error  approximation  in  most  settings 
is  that  solutions  on  two  grids  are  needed.  In  the  FAC  algorithm,  it  is  necessary  to  solve 
on  one  additional  grid  to  apply  the  above  approximation  to  the  original  coarse  grid.  But 
since  this  grid  can  be  coarser  than  the  given  coarse  grid,  the  expense  of  this  is  not  too 
great.  Any  other  time  that  the  error  approximation  is  used  in  the  FAC  algorithm,  the 
present  grid  already  has  an  underlying  coarser  grid  that  can  be  used  for  the  Richardson 
approximation  to  the  error. 

We  use  equation  (14)  as  an  approximation  to  the  error  in  order  to  flag  points  on  the 
grid  where  refinement  is  needed.  These  flagged  points  are  then  grouped  together  to  form 
patches  that  are  rectangular  in  shape  but  may  overlap.  For  efficiency,  we  use  a  scheme 
that  forms  small  patches,  then  checks  to  see  if  some  of  these  patches  should  be  combined  to 
make  larger  ones.  Because  we  allow  for  overlapping  irregular  patches,  we  have  implemented 
a  block  Gauss-Seidel  technique  that  uses  any  one  of  a  variety  of  solvers  for  the  blocks.  We 
are  also  implementing  a  multigrid  solver  to  treat  these  irregular  blocks  more  effectively. 


5.  Results 

In  this  section,  some  sample  computations  are  presented  that  illustrate  the  use  of  the  FAC 
algorithm.  While  there  axe  many  problems  that  could  have  been  chosen,  the  three  exam- 
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pies  presented  highlight  some  of  its  features.  The  FAC  software  that  has  been  developed 
includes  a  general  purpose  self-adaptive  linear  scheme,  a  cell  centered  version  of  the  linear 
scheme  and  a  code  that  uses  the  nonlinear  algorithm  described  in  Section  2. 

The  first  example  is  a  relatively  easy  problem  for  which  the  scheme  was  developed, 
the  five-spot  oil  reservoir  problem.  This  is  a  model  problem  for  oil  reservoir  simulation 
with  a  regular  pattern  of  wells.  The  five— spot  problem  thus  uses  a  square  domain  with 
an  injection  and  production  well  at  opposing  corners.  The  assumption  is  that  an  IMPES 
scheme  is  being  used  so  that  the  saturations  have  already  been  calculated;  it  remains  to 
calculate  the  pressures  implicitly.  We  assume  the  domain  is  of  the  form  given  in  Figure 
4  with  patches  inserted  in  the  corners  near  the  wells.  The  problem  we  must  solve  is  as 
follows. 


Figure  8.  Two  cycle  solution  injected  into  Figure  7.  Fine  grid  solution  on  upper  patch, 
the  coarse  grid. 
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V  .  (^Vp)  =  5,,  -  6I0  +  /  in  fl 

—■  =  0  on  3fl 
dn 

where  fl  is  the  square  pictured  in  Figure  4,  k,  \i  and  /  are  known  functions(of  the  satura¬ 
tion),  and  the  points  x0  and  xx  at  which  the  Dirac  Delta  functions  are  centered  are  given 
in  Figure  4. 

To  solve  this  problem,  we  use  the  cell-centered  scheme,  both  because  it  is  the  most 
natural  scheme  to  use  with  the  Neumann  boundary  conditions  and  because  cell-centered 
schemes  are  what  is  used  most  often  in  the  oil  industry.  In  Figure  5  we  display  the  coarse 
grid  solution.  This  grid  has  only  ten  points  in  each  direction,  which  is  a  very  coarse  grid, 
but  it  is  indicative  of  the  length  scales  that  are  necessary  for  such  problems.  As  can  be 
seen  in  Figure  5,  the  resolution  is  very  poor.  We  now  use  patches  at  each  of  the  wells 
that  are  the  length  of  two  coarse  grid  blocks  in  each  direction  and  a  mesh  ratio  of  four. 
Figure  6  displays  the  solution  obtained  after  two  FAC  cycles.  This  approximation  is  that 
of  the  fine  grid  solution  injected  into  the  coarse  grid  in  the  patches  and  graphed  along 
with  the  coarse  grid  in  the  remaining  regiens.  As  before,  the  resolution  is  again  very  poor, 
but  the  accuracy  at  the  wells  has  changed  dramatically.  Finally,  in  Figure  7  we  display 
the  graph  of  this  solution  on  one  of  the  patches.  This  shows  that  FAC  is  able  to  provide 
resolution  and  accuracy  that  is  not  necessarily  visible  on  the  coarse  grid.  (FAC  actually 
improved  accuracy  outside  the  patches,  but  the  scale  of  this  improvement  is  too  fine  for 
visual  detection.)  The  asymptotic  rate  of  convergence  (the  ratios  of  consecutive  l2  norms  of 
residuals)  for  this  problem  was  .0075,  which  is  quite  a  bit  better  than  we  should  expect. 

The  next  example  is  designed  to  illustrate  the  self-adaptive  capabilities  of  FAC.  Con¬ 
sider  the  rectangle  R  =  (—1.5, 1.5)  x  (—1,2)  and  the  following  problem  defined  on  R. 

N 

Vs  a,  6t  (x) 

n  =  1 

<t>  =  g(x)  on 

where  <5,  denotes  Dirac  Delta  function  centered  at  points  x, ,  a,  is  the  strength  of  the  ith 
source  or  sink,  and  g  specifies  the  boundary  data.  In  the  tests  described  below,  we  used 
IV  =  11  and  25  points  in  each  direction  on  the  coarse  grid.  The  usual  5-point  stencil  was 
used  to  descretize  the  Laplacian  on  both  the  coarse  grid  and  the  fine  grid.  Finite  volume 
discretization  was  used  at  the  grid  interface. 

The  code  decides  where  additional  resolution  is  necessary,  chooses  the  appropriate 
mesh  refinement  multiple  according  to  the  given  tolerance  and  solves  on  the  patches, 
passing  this  information  back  to  the  coarse  grid  as  described  in  the  FAC  algorithm.  As 
can  be  seen  in  Figure  9,  the  code  has  constructed  one  large  irregular  patch  consisting  of 
seven  rectangular  patches.  For  the  first  test,  we  allowed  only  two  levels  of  grids.  To  meet 
the  given  tolerance,  the  code  chose  a  mesh  refinement  multiple  of  four.  All  the  results 
given  below  are  from  two  cycles  of  FAC.  Figure  8  is  a  plot  of  the  solution  calculated  on 
the  coarse  grid.  The  location  of  the  wells  and  the  resulting  fine  grid  patches  are  given  in 
Figure  9.  Finally,  Figure  10  depicts  the  solution  after  two  FAC  cycles,  with  the  fine  grid 
solution  given  on  the  patches  and  the  coarse  grid  solution  linearly  interpolated  to  a  fine 
grid  on  the  rest  of  the  domain. 
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Figure  8.  Coarse  grid  solution. 
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Figure  9.  Sources,  sinks  and  patches. 
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Figure  10.  Composite  grid  solution  after  two  cycles. 
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Figure  12.  Three  level  composite  grid  solution. 
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Allowing  a  third  level  of  refinement,  the  self-adaptive  scheme  chose  the  patches  as 
shown  in  Figure  11,  just  covering  each  of  the  wells,  with  a  mesh  ratio  of  two.  A  graph 
of  the  solution  of  the  three  level  composite  grid  problem  is  given  in  Figure  12.  (For 
displaying  purposes,  the  composite  grid  is  mapped  to  the  global  grid  with  the  resolution 
of  the  intermediate  level.)  Careful  inspection  will  show  a  marked  difference  between  the 
two  and  three  level  solutions  at  the  wells. 

To  demonstrate  that  FAC  can  be  used  with  nonlinear  equations  and  to  resolve  shock 
fronts,  we  applied  it  to  the  thin  disturbance  transonic  flow  equations.  This  is  also  an 
excellent  example  that  illustrates  how  important  the  treatment  of  the  difference  equations 
at  the  grid  interface  can  be.  Classical  results  concerning  this  problem  can  be  found  in 
Murman  and  Cole  [6] .  The  specific  problem  we  solved  is 

((1  -  Ml  -  —MM*.],  +  ted,,  =  0 

at 

dn  9 

Here,  g  =  0  except  for  y  =  0  and  0  <  x  <  1,  where  it  is  equal  to  the  slope  of  the  circular 
arc  air  foil. 

We  used  a  coarse  grid  with  40  horizontal  and  14  vertical  points  and  a  patch  covering 
the  area  above  the  airfoil  that  is  sufficiently  high  to  include  the  entire  region  where  the 
flow  is  hyperbolic.  The  solver  used  on  each  patch  was  vertical  line  SOR(the  same  used  by 
Murman  and  Cole).  As  before,  we  display  the  results  after  two  FAC  cycles. 

As  can  be  seen  in  Figure  13,  where  for  a  subsonic  case  we  plot  the  resulting  pressure 
coefficient  immediately  above  the  airfoil,  the  coarse  grid  solution  has  the  least  accuracy. 
One  cycle  or  two  of  FAC  using  a  five-point  scheme  at  the  grid  interface  improves  the 


in  R  =  (-1,2)  x  (0,1) 
on  dR. 


Figure  13.  Coefficient  of  pressure  immediately  above  the  airfoil.  0  coarse  solution,  + 
one  FAC  cycle  and  o  two  FAC  cycle  solutions  with  a  five-point  flux  operator  at  the  grid 
interface,  A  two  FAC  cycle  solution  with  a  seven-point  flux  operator  at  the  grid  interface 
and  x  extensive  fine  grid  solution. 
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solution  more.  Use  of  a  7-point  scheme  at  the  grid  interface  produces  results  that  are 
almost  the  same  as  that  of  the  uniform  fine  grid  solution.  Although  this  is  an  easy  problem, 
these  results  demonstrate  that  (1)  FAC  works  for  nonlinear  elliptic  problems,  (2)  even  for 
relatively  easy  problems  one  cycle  is  not  enough,  and  (3)  even  for  easy  problems  care  must 
be  taken  on  how  we  treat  the  interior  boundary. 

We  next  increase  to  get  a  transonic  flow.  As  before,  we  present  the  plots  of  the 
pressure  coefficients  immediately  above  the  airfoil.  As  can  be  seen  in  Figure  14  the  results 
using  the  conservative  7-point  scheme  on  the  interior  boundary  and  the  extensive  fine  grid 
are  virtually  identical.  In  this  application,  the  actual  savings  in  the  number  of  grid  points 
is  not  significant.  But  in  larger  scale  three-dimensional  problems,  the  savings  could  be 
dramatic. 

Other  observations  that  can  be  made  from  the  results  presented  in  Figure  14  include 
the  following.  We  note  that  one  FAC  cycle  gives  very  bad  results.  Of  course,  this  is  because 
there  has  been  no  transfer  of  information  between  the  fine  grid  and  coarse  grid,  so  accuracy 
is  dictated  by  the  coarse  approximation  obtained  at  the  interface  points.  We  include  this 
plot  because  this  method  of  mesh  refinement  is  a  common  one.  Note  that  two  cycles  of 
FAC  can  give  results  comparable  to  global  refinement.  The  second  point  that  must  be 
made  is  that  the  results  using  the  5-point,  approximately  conservative  flux  operator,  while 
not  bad,  are  not  nearly  as  good  as  those  with  the  conservative  7-point  one.  This  illustrates 
the  importance  of  the  interface  treatment:  the  5-point  flux  operator  is  natural,  but  not 
conservative.  Apparently,  the  7-point  flux  operator,  which  is  conservative  for  the  discrete 
problem  and  approximates  the  conservation  of  the  continuous  problem,  does  much  better. 


Figure  14.  Coefficient  of  pressure  immediately  above  the  airfoil.  U  coarse  solution,  o 
one  FAC  cycle  and  A  two  FAC  cycle  solution  with  a  five-point  flux  operator  at  the  grid 
interface,  x  two  FAC  cycle  solution  with  a  seven-point  flux  operator  at  the  grid  interface 
and  +  an  extensive  fine  grid  solution. 
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ABSTRACT.  The  two-phase  Stefan  problem  in  its  enthalpy  formu¬ 
lation  is  discretized  implicitly  in  time  requiring  the  solution 
of  an  elliptic  differential  inclusion  at  each  time-step  which  is 
then  approximated  by  standard  finite  difference  techniques.  Using 
a  hierarchy  of  grids,  two  multi-grid  methods  are  developed  for 
the  efficient  solution  of  the  resulting  difference  inclusions. 

For  one  of  them,  a  convergence  result  is  given  based  on  nonlinear 
multi-grid  convergence  theory  and  elementary  subdifferential  cal¬ 
culus.  Finally,  the  performance  of  both  algorithms  is  illustrated 
by  some  numerical  results. 


1 .  INTRODUCTION 

The  two-phase  Stefan  problem  describes  the  space-time  temperature 

distribution  9(x,t),  (x , t) £Q  :  =  fix (Tq ,T^ )  of  a  heat-conducting 

substance  undergoing  a  change  of  phase  (e.g.  from  solid  to 

liquid)  at  a  prespecified  temperature  6  .  The  substance  is  sup- 

2  c 

posed  to  occupy  a  bounded  domain  ficfc  with  a  prescribed  initial 
temperature  9q(x),  x€ft  ,  at  t = Tq  and  a  prescribed  temperature 
g(x,t),  x€T,  t€(T  ,T^)  on  the  boundary  T =  3fi  which,  for  sim¬ 
plicity,  we  assume  to  be  zero.  The  volumetric  heat  capacity,  the 
latent  energy  content  and  the  thermal  conductivity  are  described 
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by  functions  c  ,  s  and  k  of  8  ,  respectively.  In  particular, 
these  functions  are  assumed  to  be  piecewise  smooth  with  jump  dis¬ 
continuities  only  at  the  change  phase  temperature  9c  ,  c  and 
s  being  monotonely  increasing,  k  monotonely  decreasing  with 
c  ,  k  positive,  k  being  bounded  away  from  zero.  Finally,  the 
function  f  =  f(x,t),  (x,t)£Q  ,  refers  to  external  heat  sources 

and/or  sinks. 

The  two  phases  of  the  substance  are  described  by  the  sets 
Qx  =  {  (x,t)€Q  ]  (-1)  1(0(x,t)-0c)  >  0}  ,  i=1,2  .  Setting 

Z  =  {  (x,t)  £Q  j  0  (x,t)  =  0C>  /  the  interfaces  I1  =  cl  Q1  (l  Z  ,  i  =  1 ,2  , 
are  assumed  to  be  sufficiently  smoothly  oriented  hypersurfaces . 
Then,  taking  a(0)  =  /g  c(T)di  and  denoting  by  v  the  normal 
to  Z1  and  by  ttv  ^  its  projection  into  the  plane  of  ft  ,  away 
from  the  zone  of  phase  change  the  temperature  satisfies  the  dif¬ 
fusion  equation 

-^(ot(0)+s(0))  -  V*  (k(0)  V0)  +  f  =  0,  (x,t)£Q^Z  (1.1a) 

while  at  the  interfaces,  setting  h  ( 9  ±)  =  lim  h(0)  for  h  =  k,s, 

^  Q_*0  + 

the  jump  condition  c~ 

k  (0  +  )70  *TTV  _  -  k  (0  -)V0  •  tt  .  =  (1.1b) 

c  c  c  c^i 

s (0+) cos (v  ,,1  )  -  s (0— )cos (v  .,1  ) 
c  z2  c  E1  t 

holds  relating  the  rate  of  phase  change  to  the  rate  of  absorption 
of  heat  energy. 

Numerical  methods  for  the  approximate  solution  of  two-phase 
Stefan  problems  which  rely  on  the  diffusion  equation  (1.1a)  and 
the  flux  balance  (1.1b)  must  take  into  account  the  space- time 
evolution  of  the  interfaces  and  so  are  typically  based  on  front 
tracking  techniques  (cf.  e.g.  [11]). 

An  alternative  approach  is  to  get  rid  of  the  flux  condition 
(1.1b)  by  absorbing  it  into  the  differential  equations  which  are 
then  solved  on  the  fixed  domain  Q  .  This  can  be  achieved  by 
introducing  a  generalized  temperature  u  via  the  standard 
Kirchhoff  transformation 
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e 

u  =  K ( 0 )  =  J  k(T)dT  (1.2) 

ec 

and  then  defining  a  generalized  enthalpy  H(u)  by 

H (u)  =  Q(K-1u)  (1.3) 

where  Q(t)  =  a(t)+s(t),  t*0  ,  is  the  standard  enthalpy. 

Due  to  the  assumptions  on  c  ,  k  and  s  ,  the  enthalpy 
function  H  is  piecewise  smooth  with  a  jump  discontinuity  at 
u  =  0  and  a  positive  derivative  for  u  *  0  which  is  bounded  away 
from  zero.  Then,  the  enthalpy  formulation  of  the  two-phase  Stefan 
problem  is 

~  H (u)  -  Au  +  f  =  0  (1.4) 

which,  of  course,  has  to  be  understood  in  an  appropriate  weak 
sense.  In  particular,  a  suitable  solution  class  is  L°°((To,T1); 

H1  (fi))  n  Hj([TQ,T1];L2(fi))  nL°°(Q)  (cf.  e.g.  [10]). 

If  we  formally  discretize  (1.4)  implicitly  in  time  with 
respect  to  a  uniform  partition  ltQ,t1 , . . . ,tm]  ,  tm = mAt  , 

OSmSM  ,  At  =  (T1-Tq)/M  ,  we  arrive  at  a  sequence  of  nonlinear 

elliptic  boundary  value  problems 

H(u  )  =  H(u  )  +  AtAu  -  Atf  m  fi  (1.5a) 

um+1  =  0  on  T,  0  < m ^ M-1  (1.5b) 

where  u™  is  an  approximation  to  u  at  time  t  and  fm+1  = 

=  f(*,tm+^)  .  Equations  (1.5a)  must  still  be  interpreted  in  a 

suitable  weak  sense,  since  the  enthalpy  function  H  is  set- 

-valued.  It  is  shown  in  [9]  that,  for  sufficiently  small  At  , 

(1.5)  admits  a  unique  solution  um  ,  0 S m £ M  ,  whose  piecewise 

2 

linear  prolongation  to  Q  converges  in  L  (Q)  to  the  solution 
of  (1.4)  as  At  0  . 

Fixing  some  appropriate  Hm£H(um)  ,  (1.5)  can  be  rewritten 
as  the  differential  inclusion 


% 
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T  in  ^  1  .  in  ^  1  T  T  /  m  ^  h  in  n  , .  £  \ 

-  Lu  +  b  €  H  (u  >  in  Si  (1.6a) 

Um+1  =  0  on  T,  0 S m S M-1  (1.6b) 

where  L  =  -AtA  and  bm+1  =  Hm-Atfm+^  .  Concerning  the  solution 

of  the  above  differential  inclusion,  we  remark  that  H  is  a  maxi- 

2 

mal  monotone  graph  in  |R  and  hence,  there  exists  a  lower  semi- 
continuous  proper  convex  function  $  whose  subgradient  3$  is 
given  by  H  (cf.  e.g.  El]).  Defining  an  enthalpy  functional  by 
ip(v)  =  $  (v  (x)  )  dx  ,  vf  hJ(Q)  ,  the  differential  inclusion  can 

be  formulated  as  the  nonlinear  elliptic  variational  inequality 

a(u  ,v-u  )+<p(v)-<p(u  )  +  (b  ,v-u  )  SO,  v£HQ(f2)  (1.7) 

2 

where  a(u,v)  =  At(Vu,7v)  and  (*,•)  denotes  the  usual  L  -scalar 
product.  On  the  other  hand,  (1.7)  is  the  necessary  and  sufficient 
optimality  condition  for  the  minimization  of  the  convex 
functional 

J(v)  =  j  a(v,v)  +  tp(v)  +  (bm+1,v)  .  (1.8) 

Minimizing  the  functional  J  over  finite  dimensional  subspaces 
governed  by  first  order  Lagrangean  finite  elements  and  solving 
the  finite  dimensional  minimization  problems  by  Gauss-Seidel 
relaxation  with  relaxation  parameter  u)  =  1  when  a  phase  change 
occurs  and  over- relaxation  elsewhere,  global  convergence  of  that 
algorithm  has  been  established  in  [6], 

In  this  paper  we  consider  finite  difference  discretizations 
of  the  differential  inclusion  (1.5)  with  respect  to  grid-point 
sets  of  grid-length  h  >  0  resulting  in  the  difference 

inclusions 

-  LhuJJ+1  +  bJJ+1  £  H(uJ+1)  in  Qh  ,  (1.9a) 

uj+1  =  0  on  Th  =  3fih  (1  -9b) 

where  =  -AtA^  ,  A^  denoting  the  standard  five-point  appro¬ 
ximation  of  the  Laplacian,  and  b™+1  =  -Atf ^ ( • , t  1 ) +H (u™)  , 
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being  a  suitable  approximation  to 


f h ( ’ ' bm+ 1  * 

Ordering  the  grid-points  according  to  xv 
Nh  =  card(fi^) 


f(*'tm+1) 
,x. 


on 


fi. 


h ,  1 '  *  *  *  h ,  Nh  ' 
and  incorporating  the  boundary  conditions,  (1.9) 


can  be  written  algebraically  as 


a  m+ 1 

Ahuh 


,m+1  _  „ .  m+1 . 
+  bh  £H(uh  > 


(1.10) 


where  now  u™+ 1  and  b^+  ^  are  the  vectors  with  components 

uh,i  =  uh  (xh,i>'  bh,i  =  bh  (xh,i>'  1  Sl=Nh  '  and  Ah  15  the 
sparse  symmetric  positive  definite  matrix  associated  with  Lh 

In  the  sequel  the  emphasis  is  on  the  iterative  solution  of  (1.10) 
by  multi-grid  techniques  with  respect  to  a  hierarchy  of  grids.  We 
will  present  two  such  methods  which  both  are  based  on  an  equiva¬ 
lent  reformulation  of  (1.9)  resp.  (1.10)  as  systems  of  nonlinear 
difference  equations. 

In  particular,  for  the  first  scheme  we  construct  a  single-valued 
"modified  enthalpy  function"  HK:  R  **  +  R  ^  which  coincides  with 

ll 

H(uh  )  away  from  the  change  of  phase  region  and  which  is  defined 
by  interpolation  between  H(A)|A<0  and  H ( X ) | A > q  when  a  change 
of  phase  occurs.  Consequently,  (1.10)  is  replaced  by 


u  ,  m+1.  „  m+1  ,m+1 

Vuh  >  +  Ahuh  =  bh 


(1.11) 


Hm  =  «h(ujj) 


in  the  defi- 


which  also  strongly  suggests  to  choose 

nition  of  the  right-hand  side  b™+  .  On  the  other  hand,  the  second 
scheme  is  based  on  a  duality  argument  from  convex  analysis: 


Observing  H  =  3$ 


(1.10)  is  equivalent  to 


m+1  a  „ni+1 

uh  €3$*(-Ahuh  +bh  ) 


(1.12) 


where  $*  denotes  the  conjugate  of  4>  (cf.[5]).  Since  here  the 

subgradient  3$*  is  single-valued,  (1.12)  is  no  longer  a  diffe¬ 
rence  inclusion,  but  a  piecewise  linear  difference  equation. 

The  paper  is  organized  as  follows:  Section  2  contains  a  de¬ 
tailed  description  of  both  multi-grid  algorithms  followed  by  a 
convergence  proof  for  the  second  algorithm  in  Section  3  where 
nonlinear  multi-grid  convergence  theory  and  elementary  subdiffe¬ 
rential  calculus  are  used  as  basic  tools.  Finally,  in  Section  4 
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the  efficiency  of  both  schemes  is  illustrated  by  some  numerical 
results. 

2.  THE  MULTI-GRID  ALGORITHMS 

Throughout  the  following,  we  take  0c  =  0  as  the  nominal  phase 
change  temperature  and,  for  simplicity,  we  assume  the  functions 
c  ,  k  and  s  to  be  piecewise  linear  ,  i.e. 


c (0)  =  c1  , 

k ( 0)  =  k1  , 

s (9)  =  s1  , 

0  <  0 

(2.1a) 

c(0)  =  c 2  , 

k(0)  =  k2  , 

s (0)  =  s2  , 

0  >  0 

(2.1b) 

where  0  <  c1  S  c2  ,  0  <  k2  <  k^  and  s  being  normalized  such  that 
s.j  =  0  and  s2  =  s  >  0  .  In  view  of  (1.3),  setting  a^  =  c^/k^  , 
i=  1,2,  the  enthalpy  function  H  turns  out  to  be 

* 

a1  A  ,  X  <  0 

H(A)  =<|  [0,s]  ,  X  =  0  (2.2) 

[  a2X  +  s  .  A  >  0  . 

We  consider  a  hierarchy  ^^>^=0  9rids  with  step-sizes 

hk+1  =  h^/2  ,  0  i  k  i  1-1  ,  given  some  hQ  >  0  ,  and  difference 
operators  L^  =  -AtA^  ,  0  S  k  i  1  ,  A^  denoting  the  standard  five- 
-point  approximation  of  the  Laplacian  on  the  grid  .  Further, 

we  choose  grid  functions  b^  on  ,  which  we  consider  as 

suitable  approximations  to  -Atf ( • , tm+ 1 ) +H (um)  .  Then,  we  aim  to 
solve  the  difference  inclusion 

-  L^u^  +  b^eHfu^)  in  , 
u^  =  0  on  ri  = 

resp.  the  equivalent  algebraic  system 

-  A^u^  +  b^€H(u^)  (2.4) 

by  multi-grid  algorithms  involving  the  given  hierarchy  of  grids. 
The  first  approach  is  to  replace  the  set-valued  function  H 


(2.3a) 

(2.3b) 
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by  an  appropriately  chosen  single-valued  function  H  thus 
getting  rid  of  the  inclusions  in  (2.4). 

Now,  setting 


X,i(ul>  =  '  aijUj  +  bl,i  '  1  Si-N] 


(2.5) 


where  a|  .  ,  1  S  i,j  <N^  ,  are  the  components  of  the  matrix 

,  in  view  of  (2.2)  it  follows  readily  from  (2.4)  that 


u.  .  <  0 

l,i 


u.  .  >  0 

l,i 


Al,i(ul)  <  0 


uLi  =  0  L  «  -  0  <  X1  ±  (la-,^)  S  s  ,  1  <  i  S  Nj 


(2.6) 


xl,i<Ui)  >  s 


Since  for  0  S  A.  .  (u. )  S  s  none  of  the  equations  in  (2.4)  gives 

X  r  1  X 

the  correct  change  of  phase  temperature,  it  seems  reasonable  to 
use  a  convex  combination  of  both  equations 


Xl,i<ul> 


(-AlVi+bl,i=  t1-  --S-— )a-,Ui,i  + 


Xl,i(V 


(a2ul , i+s) •  (2.7) 


Obviously,  (2.7)  implies  u.  .  =0  and  hence,  setting 

X  /  1 


a.v.  . 

1 1,1 


xii'vi)<0  (2-8) 


X1  i(vl}  X1  i(vil 

Hl,i(vl>  -  (1 - - )aivlfi+  - - <a2vl,i+s)'  0SX1,i(v1)“s 


a„v.  .  +-s 
-  2  l,i 


Xl,i(vl)>s 


the  system  (2.4)  of  inclusions  is  equivalent  to  the  system  of 
difference  equations 


-  A1u1  +  ^  =  H1(u1) 


(2.9) 


It  is  the  above  system  to  which  now  the  standard  nonlinear  multi- 
-grid  approach  will  be  applied  involving  at  each  level  0  i  k  £  1 
the  nonlinear  mappings  as  given  by 
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W  =  bk  -  Vk  '  w 


where  H ^  is  defined  by  (2.8)  with  1  replaced  by  k  .  Note 
that  in  the  definition  of  ^(v^)  >  1  £  i  £  '  according  to 

(2.5)  the  approximation  b^  of  the  continuous  right-hand  side  on 
level  k  is  used.  In  particular,  starting  from  an  iterate  u^  , 
v  S  0  ,  on  level  1  ,  we  first  determine  a  smoothed  iterate  u^ 

by  performing  £  0  nonlinear  Gauss-Seidel  iterations  with 
u^  as  startiterate ,  i.e.  setting  u^'^=u^  we  compute 


-V  _  V  ,  K  , 

U1  U1 


=  S,  (u 


V,K-1  r 


;b1) , 


Hk<k, 


(2.10) 


where  the  operator  ( - #b^)  formally  denotes  the  performance  of 
one  Gauss-Seidel  iteration  step  applied  to  the  equation  F^fu^)  = 
=  b^  with  b^  =  0.  Note  that  in  view  of  (2.8)  the  components  of 
u^'K  can  be  easily  computed  according  to 


V  ,  K 


dh,i/(a1  +  aii> 

,  if 

d,  .  <  0 
h,  1 

0 

f  if 

0  £  d,  .  <  s 
h,  1 

(2.11) 

(dh,i_s)/(a2+aii) 

»  if 

d,  .  >  s 
h  ,  1 

where 


d  =  -  l  a1  -  l  af  uy '  ^  ’  +  b,  .  ,  1  £  i  £  N  (2.12) 

n'1  j=1  13  j=i+1  1,3  1,1  1 

Next,  choosing  appropriate  prolongation  and  restriction  operators 
p^_1  resp.  r^  1  ,  a  new  iterate  u^'new  will  be  determined  by 


1  v ,  k  —  1 


-v,new  -v  .  1  ,  1—1  —  v v 

u^  =  ui  +  “Pi_i  (ui-i  “  ri  ui> 


(2.13) 


where  u  is  a  suitable  relaxation  parameter  and  u}.-1  is  the 
solution  of  the  difference  equation 

Fl-1(ul-1)  =  Fl-1(rl~1^l)  "  rl~lFl(Sl)  =:  bl-1  ’  (2.14) 

Finally,  the  (v+1)-st  iterate  u^+1  will  be  obtained  by  <2  £  0 
Gauss-Seidel  iterations  starting  from  u^'new  ,  i.e.  setting 
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-v,0  -v.new 

u^  =  u^  we  compute 


uf 1  =  U^K2  ,  ^'*  =  5^ 


”V'  K  1  ,'b^)  ,  H  <  $  k2  • 


(2.15) 


In  case  of  more  than  two  grids  the  solution  of  the  correction 
equation  (2.14)  on  level  1-1  will  be  replaced  by  a  corresponding 
two-level  iteration  involving  the  grids  and  •  anc^ 

this  process  will  be  continued  until  the  lowest  level  k  =  0  is 
reached.  On  the  coarsest  grid  an  approximation  to  the 

correction  equation  will  be  determined  by  performing  >  0 
Gauss-Seidel  iterations. 

Moreover,  at  each  intermediate  level  1  <  k  <  1  we  will  provide  the 
option  to  perform  several,  let's  say  y^  ,  reduced  multi-grid 
cycles  for  the  computation  of  an  approximation  to  the  correction 
equation  on  that  level.  Consequently,  the  complete  multi-grid 
algorithm  can  be  described  by  the  following  procedure  MGSTEF1 
(l,u^,b^)  with  b^ = 0  and  ui = before  resp.  u^ = u^+1  after 
the  execution  of  the  algorithm: 


procedure  MGSTEF1  (l,u^,b^);  integer  i,l;  array  u^,b^; 
if  1=0  then 

for  i:=1  step  1  until  k0  do  u  : =S  (u  ;b  )  else 

j  0  0  0  0 

begin  array  u1_1»b1_1  ; 

for  i:=1  step  1  until  do  u^:=S^(u^;b^) ; 

1-1 

ul-1:=rl  U1  ; 

bj^.T  :=F1_1  (r^Uj.)  -  r^_1(F1(u1)  -  b^  ; 

for  i:  =  1  step  1  until  y-L_1  do  MGSTEF1  (1-1  ,u1_1  ,b1_1 )  ; 
u1:=u1+a,pl_1(ui_1-r^1ui); 
for  i :  =  1  step  1  until  <2  do  u^  :  =S^  (u-^b^)  ; 
end  MGSTEF 1 . 


A  suitable  startiterate  can  be  provided  by  nested  iteration 


m+1 


m 


incorporating  the  already  known  values  of  u! 
time  t  =  t  by  assuming  that  uj1 
prolongations  >  1  £  k  < 1 


m 


,  m+1  m 

-uk«Pk-i(uk-ruk-i 


0  S  k  <  1  ,  at 

)  with 


being  not  necessarily  the  same 
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as  used  in  MGSTEF1.  To  be  more  precise,  having  determined  an 
approximation  u^+1  on  the  coarsest  grid,  at  each  intermediate 
level  1  £ k £  1-1  we  compute  an  approximation  u™+1  by  perfor¬ 
ming  a  certain  number  of  multi-grid  cycles  MGSTEF1  (k ,u^+ 1 ,b™+ 1 ) 
starting  from  u™+p£_i “uk_i ^ *  T^e  c°mPlete  algorithm  is 
described  by  the  following  procedure  NISTEF1  (1  ,u™+1  ,u™,b*j*+1 )  : 

procedure  NISTEF  1(1,  u™+  ** ,  u^,  b™+  ^ )  ; 

.  m+1  m  rm+1 

integer  1;  array  u^  ,ui'bi  ; 

begin  integer  i,k; 

for  k:=0  step  1  until  1-1  do 

.  c  i  r,  m+ 1  m  . 
if  k=0  u,  :=u,  else 
k  k 

m+1  m ,~k  ,  m+ 1  m  . 

uk  :=uk+Pk-i(uk-ruk-i); 

for  i:=1  step  1  until  do  MGSTEF1 (k , u™+ 1 ,b™+ 1 ) ; 

m+1  m  ~1  ,  m+1  m  . 

U1  !=ul+Pl-l(ul-rul-i>' 

end  NISTEF 1 . 


REMARK.  Concerning  the  choice  of  the  relaxation  parameter  w  and 

k  ~k  k-1 

the  prolongations  p^^  resp.  P^.-]  and  restrictions  r^  in 

MGSTEF1  resp.  NISTEF 1 ,  numerical  evidence  (cf.  Section  4) 

suggested  to  use  underrelaxation  (i.e.  w  <  1  )  and  to  determine 
k  ~k  k- 1 

p^_^  fP^.i  resp.  r^  as  the  standard  prolongation  based  on  bi¬ 
linear  interpolation  resp.  the  corresponding  full  weighted  re¬ 
striction. 


The  second  approach  to  the  multi-grid  solution  of  the  diffe¬ 
rence  inclusion  (2.3)  resp.  the  algebraic  system  (2.4)  is  based 
on  the  difference  equation  (1.12)  involving  the  conjugate  <I>*  to 
<t>  (remind  that  H=  3$  ).  Now,  in  view  of  (2.2)  an  easy  calcula¬ 

tion  reveals  that  $  and  its  conjugate  $*  are  given  by 


12  12 
j  a2\l  +  j  a^f  +  sA  + 


(2.16) 


1-1  2  1-12 

4>*(A)  =  ^  a2' (A-s) ^  j  a1 V 


.  '  ■  _  C  ■ 

•  ■  .**-  ,  • 


(2.17) 
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where  A+=max(A,0)  and  A_  =  min(A,0)  .  Consequently,  the  sub¬ 
gradient  9$*  is  the  piecewise  linear  continuous  function 


3$*(X) 


f  a ^  s) 


,  A  >  s 
,  A€[0,s] 

,  A  <  0  , 


(2.18) 


and  therefore,  on  the  finest  grid  ,  (1.12)  can  be  written 

as  the  piecewise  linear  difference  equation 


Fl(ul)  =  U1  ‘  34*<_A1U1 + =  0  (2.19) 

Then,  given  an  iterate  ,  \>  1  0  ,  we  perform  l  0  smoothing 

iterations  as  in  (2.10)  where  now  describes  a  non¬ 

linear  Gauss-Seidel  iteration  applied  to  the  equation  F-^tu^)  =0 
with  b^ = b^  in  (2.19).  Observing  (2.18),  it  turns  out  that  the 
components  of  the  Gauss-Seidel  iterates  can  be  computed  exactly 
in  the  same  way  as  described  by  ( 2 . 1 1 ) ,  ( 2 . 1 2) .  Having  determined 
the  smoothed  iterate  u^  ,  we  wish  to  find  a  correction  w-^ 
such  that 


-  A-^Ju^  +  w^)  +  b-^CHfu^  +  w^)  .  (2.20) 

Rewriting  (2.20)  as 

u^  +  w.^  =  3$*  (  -  A^w^  -  (A^u^-b^)  )  ,  (2.21) 

~  1  i-i -v 

we  see  that  w^  can  be  approximated  by  w^  =  Pi_i ^ui-i“ri  ui^  ' 
where  u1_1  is  the  solution  of  the  difference  equation 

ul-1  =  3$* (  “  Ai-iui_i  + hf-i )  f  (2.22a) 

or  equivalently,  the  inclusion 


A1-1U1-1 


b1_1  £ H(u1_1) 


(2.22b) 


where 
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c  »  1-1-v  1-1  -v  r  , 

bl-1  =  Al-1rl  U1  “  rl  (A1u1~b1)  . 


Thus,  we  determine  a  new  iterate  u^,new  by 


-v,new  _  -v 


1-1-v. 


ui  +  Pi-i(ui-rri  V 


(2.23) 


(2.24) 


and  we  compute  u^  +  "'  by  <2-0  Gauss-Seidel  iterations  as  in 
(2.15) . 

So  far  we  have  described  the  two-grid  situation.  In  case  of 
more  than  two  grids,  the  solution  of  the  correction  inclusion 
(2.22b)  on  level  1-1  is  replaced  by  a  corresponding  two-level 
iteration  involving  the  grids  an<^  ^1-2  '  an<^  this 

process  is  continued  until  the  lowest  level  k  = 0  is  reached.  On 
the  coarsest  grid  an  approximation  to  the  correction  inclu¬ 

sion  will  be  determined  by  performing  <3  >  0  Gauss-Seidel  itera¬ 
tions. 

A  condition  for  the  multi-grid  algorithm,  which  obviously 

* 

should  be  fulfilled,  is  that  the  solution  u^  of  the  difference 

inclusion  (2.4)  on  level  1  is  a  fixed  point  of  the  iteration 

1-1  * 

process.  In  view  of  (2.24)  this  is  guaranteed,  if  r^  u1  is  the 
unique  solution  of  the  correction  inclusion  on  level  1-1  .  Exa¬ 

mining  (2.22),  it  turns  out  that  the  restriction  operator  rj  1 

1-1  *  1 

must  be  chosen  with  care  in  order  to  ensure  u^_1  =  ri  ui  •  To 
investigate  the  difficulties  more  carefully  which  might  arise,  we 
decompose  the  grid-point  sets  fik  ,  0  S  k  S  1  ,  according  to 


fik  =  fik(uk}  Uf2k(uk)  U  Ek(uk) 


(2.25) 


where 

fik(V  =  {xefik  !  (-1)1uk(x)>0)  ,  i  =  1  , 2  (2.26a) 

Zk(uk)  =  {xenk  I  uk (x) =0}  .  (2.26b) 


Further,  for  each  xGfik  we  define  Nk(x)  as  the  set  consisting 
of  x  and  its  eight  neighbouring  grid-points,  i.e.  Nk(x)  = 

=  {x,x±hke^  |  1 Sk^4 }  ,  ek  =  ( 1 , 0)  ,  ek  =  (0 , 1 )  ,  ek  =  ek+ek  , 
ek  =  e^~ek  ,  for  all  x€f2k  with  the  possible  exception  of 
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points  adjacent  to  =  3^k  where  the  e^  's  have  to  be  defined 
appropriately.  Then  the  sets 


Ek(V  =  {x€fik<uJ  I  Nv(x)  £fi*(ut) }  ,  i  =  1  f 2 


‘k'“k 


k  k 


Zk(uk)  =  ^xe^<ui,)  |  Nv(x)  i  Et(uv)  } 


Jk'uk 


k  k 


(2.27) 


will  be  denoted  as  "discrete  interfaces".  Moreover,  a  grid-point 

xCQ^_1  will  be  said  regular  with  respect  to  the  grid  function 

uk  on  level  k  ,  if  Nk (x)  c fi* (uk)  ,  i£(1,2}  ,  or  Nk (x) clk (uk) 

and  irregular  otherwise.  Now,  choosing  p!^  .  as  bilinear  inter- 

k-1  K_ 

polation  and  rk  as  the  corresponding  full  weighted  restric- 
1—1  * 

tion,  ui_i  =  in  general  will  be  violated  at  grid-points 

x6fi1_1  which  are  irregular  with  respect  to  u.  while  at  such 

1  i_i  *  1 

grid-points  u.  .  =  r.  u.  obviously  will  be  true,  if  pointwise 

l  i  ~k-1 

restriction  is  used.  Hence,  denoting  by  rk  the  full  weighted 
and  by  rk  1  the  pointwise  restriction,  a  convenient  choice  is 


v  .  (r,  u,  )  (x)  ,  if  x  is  regular 

(rk  V  (x)  H  ok-1 

(rx  uk) (x) ,  if  x  is  irregular 


(2.28) 


Moreover,  at  each  level  1  £k£l  the  coarse-to-f ine  transfer 
should  only  affect  grid-points  in  fik(uk)  =  ftkM£k(uk)  U  Zk^uk^  U 
U  ^k<uk))  •  That  is,  denoting  by  p£_i  the  prolongation  based 

on  bilinear  interpolation,  we  set 


,  k  W|  J  (Pk-1uk-1)(x)'  if  x£Rk(V 

(Pk_iu„_i)  <x)  =1 

0  ,  otherwise 


(2.29) 


Then,  the  complete  multi-grid  algorithm  will  be  described  by  the 

following  procedure  MGSTEF2 (l,u. ,b. )  with  u.  = u^  before  and 

v+1  x  1  1 

U1  =  U1  after  the  execution  of  the  algorithm: 

procedure  MGSTEF2 (l,u^ ,5^) ;  integer  i,l;  array  u^jb^ 
if  1=0  then 


for  i :  =  1  step  1  until  do  u^ :  =5-^  (u^ ; b^ )  else 
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begin  array 

for  i:=1  step  1  until  do  u^ : =S^ (u^;b^) ; 

1--I 

ul-1:=rl  ul; 

-  1-11-1  - 
t>l_  1  s  =Ai_  i r i  ul-rl  (A1u1~b1)  ; 

for  i:=1  step  1  until  Y1_1  do  MGSTEF2 (1-1 ,u1_1 ,b1_1 ) ; 

1  ,  1-1  v 
u1:=u1+P1_1 (u1_1-r1  ux); 

for  i:=1  step  1  until  do  u^ : =S^ (u^ ;b^) ; 
end  MGSTEF2 . 


REMARK.  The  two  multi-grid  algorithms  MGSTEF1  and  MGSTEF2  mainly 
differ  by  the  construction  of  the  correction  process.  In  parti¬ 
cular,  the  correction  equations  (2.14)  in  MGSTEF1  cannot  be  given 
a  reinterpretation  as  difference  inclusions  as  it  is  the  case 
for  the  correction  equations  (2.22a)  in  MGSTEF2 .  However,  if  one 
applies  the  standard  nonlinear  multi-grid  approach  to  the  equa¬ 
tion  (2.19),  then  the  resulting  scheme  is  closely  related  to 
MGSTEF1 .  More  than  that,  if  conversely  the  correction  equations 
for  (2.9)  are  chosen  by  +Ai_-jri  1ui-ri  1  *Alul-bl^  = 

i ( Uf— i )  according  to  the  strategy  in  MGSTEF2  (with  in 

the  definition  of  replaced  by  ) ,  then  both  schemes 

actually  coincide. 


3.  CONVERGENCE  RESULTS 

In  this  section  we  will  give  a  local  convergence  result  for  the 
multi-gric.  algorithm  MGSTEF2 (l,u^ ,b^)  which  is  based  on  standard 
nonlinear  multi-grid  convergence  theory  (cf.  e.g.  [8])  and  ele¬ 
mentary  subdifferential  calculus  (cf.  e.g.  [4]). 

Preparatory,  we  begin  with  some  facts  about  the  nonlinear 
mapping  Fh  given  by 

W  =  vh  -  3$*('Ahvh+bh)  '  (3'1) 

Being  piecewise  linear  and  continuous,  F^  admits  a  generalized 
Jacobian  3Fh(*)  in  the  sense  of  Clarke  (cf.  e.g.  [4])  given  by 
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3Fh<V  S 


3Fh,1(Vh)X-'*X3Fh,Nh(vh) 


vhep 


(3.2) 


Note  that  the  right-hand  side  in  (3.2)  denotes  the  set  of  all 
matrices  whose  i-th  row  3F^  ^(v^)  Is  t^ie  generalized  gradient 
of  the  i-th  component  of  F^  at  which,  in  view  of 

(2.18),  is  given  by 


3Fh,i<V 


e?;  +  a  ^A, 
h  2  h,  i 

co ( e^ , e^+ a 2 1 Ah , i ) 


CO  <eh'eh+a1 1 Ah, i> 

i  .  -1 . 

eh  +  a1  Ah,i 


'  (-Vh+Vi>8 

'  (_Ahvh+bh)i  =  s 
,  0  <  (-Ahvh+bh)i <  s  (3.3) 

'  (-AhVh+bh)i  =  0 
'  <-AhVh+bh)i<0 


1  2 

where  co(wh,w^)  denotes  the  convex  hull  of  the  vectors 
i  =  1 ,2  . 

Moreover,  we  have 


w. 


Fh(uh)"Fh(vh)  €  co  3Fh([uh'vh])(Vvh> 


(3.4) 


where  the  right-hand  side  in  (3.4)  stands  for  the  convex  hull  of 
all  vectors  of  type  DFh^uh-vh^  with  DF^  £  3Fh^wh^'  wh£^uh'vh^  = 
=  ^zh  I  zh=Xuh+  vh' 

In  particular,  there  exists  a  not  necessarily  unique  matrix 
DFh^uh'vh^  e  co  3Fh^uh,vh^  such  that 

W  "W  =  DFh[uh'vh](uh*V  •  (3-5) 


Using  the  previous  results,  it  is  now  easy  to  show  that  F^ 

is  an  order-coercive  continuous  M-function.  We  recall  that  a 
N  N 

function  F  :  fc  -*•  |R  is  said  to  be  an  M-function,  if  F  is  off- 

-diagonally  antitone  and  inverse  isotone,  while  F  is  called 

k  k 

order-coercive,  if  Fu  -*• +°°  resp.  Fu  -*■-“>  for  any  monotonely 
increasing  resp.  monotonely  decreasing  unbounded  sequence 
{u^}<=HN  (cf.  e.g.  [12]). 
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Ny.  Nu 

THEOREM  3.1.  The  function  :  R  n -*■  P  “  ,  given  by  (3.1),  is  an 

order-coercive  continuous  VL-function. 

Nh 

Proof.  For  arbitrarily,  but  fixed  chosen  uh  £  P  n  consider  the 

function  tp ( t)  =  F,  .  (u,+tep),  1  2  i,j  SN,  ,  i*j  ,  where  ep  de- 
n,i  n  n  n  ,  jj.  n 

notes  the  j-th  unit  vector  in  P  .  Setting  A.  =  (a.  .) .n._.  and 

n  1  j  l  /  j  -  i 

ch  =  Ahuh-bh  •  we  find 


<P(t)  = 


—  IK  v* 

u,  . +a_  (a.  .  t+c,  .+s),  -a.  .t  >  c,  .+s 
h,r  2  13  h, i  id  h,i 

u,  .  ,  c,  .  5  -ah .  t  <  c,  ,+s 

h,  x  h,  l  xd  h ,  x 

u,  . +a. ^ (ab  .  t+c,  .)  ,  -ab  ,t  <  c,  . 

h,  l  1  id  h,  l  ID  h,  i 


Since  a£  >  a^  >  0  and  ab^  <  0  for  i*j  ,  cp  is  obviously  mono- 

tonely  decreasing,  thus  proving  that  F,  is  off-diagonally  anti- 

Nh  n 

tone.  Now,  let  uh,vh£P  n  such  that  Fh(u^)  “Fh^vh^  *  Then' 
observing  (3.5),  there  exists  DFh^uh'vh^  e  co  3Fh^uh'vh^ 
satisfying 


W-W  =  DFhtuh'vh](uh_vh)  -  0 


But  any  matrix  in  co  ^Fh^uh,vh^  is  a  nons*n9ular  M-matrix  and 
hence,  the  above  inequality  yields  u^  S  v^  proving  inverse  iso¬ 
tonicity  of  F^ 

While  continuity  of  F^  is  immediate,  to  prove  order-coerci- 
vity  we  remark  that  by  (3.1)  we  have 


(Ih+aa\)vh-bh  "  W 


(Ih+a2lAh)vh"bh+a21s 


Since  both  matrices  I^+a^A^  '  i  =  1  >  2  ,  are  nonsingular  M- 

-matrices,  they  are  order-coercive  (cf.  e.g.  [12])  whence  order- 
-coercivity  of  F^  follows  at  once  from  the  preceding  inequali¬ 
ties. 


As  an  immediate  consequence  of  the  preceding  result,  in  view 
of  [12;  Thms.  3.4,  3.7]  we  obtain: 

COROLLARY  3.2.  For  any  c^  £  P^*1  the  nonlinear  equation  Fh^uh^  = 
c^  is  uniquely  solvable  and  the  Gauss-Seidel  sequence  for  its 
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iterative  solution  is  globally  convergent. 

In  the  following  analysis,  the  discrete  analogues  |*|  , 

s  +  $  l  ,  on  |RNk  of  the  Sobolev  H®-norms  will  play  a  decisive 

role  (cf.  [7]).  In  particular,  we  choose  norms  I! ’lip  >  0SpS2  , 

on  tlNk  by  II  •II  =  |  *  I  -•>  ->  >  and  we  denote  by  ||  *||  the  cor- 

"  "P  1  1  2p-2  yP,q  N 

responding  matrix  norms,  i.e.  1  Aklp,q  =  SUP*H  Akvk:ll I  vHp  I  vk€^  ' 
vk*0}  • 

We  begin  with  the  following  stability  result  concerning  the 
solutions  of  the  piecewise  linear  difference  equations  F^Cu^)  =0, 
0  £  k  S  1  ,  where 

Fk(vk)  =  vk  -  3<t*(-Akvk+bk)  .  (3.6) 

LEMMA  3.3.  Let  uk  ,  i=1,2  ,  be  the  unique  solutions  to  the 
difference  equations  F^(u^)  =0  where  Fk  is  given  by  (3.6) 
with  bk  =  bk  .  Then  there  holds 

K-Uk«1  S  CCL»bk-bklo  <3-7» 

where  denotes  the  Lipschitz  constant  of  the  subgradient 

mapping  3$* 

Proof.  Obviously, 

Fkluk|-Fk("k)  ■  Fk('‘k>-Fkluk»  ■  <3-8’ 

=  3$*(-Akuk+bk)-3$*(-Akuk+b^)  . 

In  view  of  the  definitions  of  Fk  and  34>*  by  (3.1)  resp. 

(2.18)  we  find  that 

sk'“k-“k'  3  Fk'“k>-Fk("k>  3  ^'“k-Ok'  l3'91 

1  2 

where  the  i-th  row  of  Sk  resp.  S,  is  either  given  by  that  of 

-i  K  -i 

(1,+a.  A^)  resp,  (Ik  a2  Ak)  or  (Ik+a2  Ak)  resp.  (Ij+a-  Ak>  • 
In  any  case  both  matrices  Sk  and  are  nonsingular  M-matri- 

ces  and  hence,  using  (3.9)  in  (3.8)  gives 
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(S2)'1  (3**(-AkuJ+bJ)  -  a**(-AjcuJ+bJ))  £ 

Uk'Uk  "  (Sk,_1  (3$*(_Akuk+bk)  "  3$*<-Aku2+b2))  * 

Since  2“C  ’  i  =  1»2  •  the  assertion  follows  imme¬ 

diately  from  the  above  inequalities. 

In  the  sequel  we  will  assume  that  the  solution  u£  of  the 
difference  inclusion  (2.4)  on  level  1  satisfies 


u*  =  0  *»  H(u*  .  )  C  (0,s)  ,  1  <  i  <N.  (3.10) 

til  J-  / 1  •*- 

Then,  setting  uk  =  rk+iuk+i  '  0  -  k  -  »  with  regard  to  the 

definition  (2.28)  of  the  restriction  operators  we  also  have 

u*  L  =  0  ~  H(u*  ±)  €  (0,s)  ,  1  <  i <  Nk,  0  ^  k  ^ 1-1.  (3.11) 

In  view  of  (2. 18) , (3.3) ,  a  direct  consequence  of  (3. 10) ,  (3. 1 1 )  is 
that  the  Jacobians  3Fk(u£)  ,  OSkSl  ,  are  single-valued. 
Consequently,  by  [4;  Prop.  2.6.2],  for  each  ek > 0  there  exists 
6k  >  0  such  that 

>l3Fk,i(uk)~DFk,ituk'Vk]Hl  "  Ek  '  1SiSNk  (3'12) 

1  OFk(uk))~1DFk[uk,vkJ-Ik|j1f1  <  Ek  (3.13) 

for  all  uk,vk€fcNk  and  DFk  ^u^VjJ  £  co  3Fk  ±  ( [  uk ,  vk  ] )  resp. 

DFk[Vvk]  €  co  9Fk([uk,vk5)  such  that  Huk~uk®1  <  6k  ' 

K-Uklii  <<sk  • 

Since  under  hypothesis  (2.5)  the  Jacobian  9Sk(uk>  of  the 
Gauss-Seidel  iteration  mapping  also  is  single-valued, 

using  the  preceding  results  we  obtain: 


LEMMA  3.4.  Let  uk  ,  v  i  0  ,  be  the  v-th  iterate  of  the  multi- 
-grid  algorithm  on  level  k  and  assume  that  uk  is  the  smoothed 
iterate  obtained  by  k  Gauss-Seidel  iterations  starting  from 
uk  .  Then,  there  exists  a  matrix  D<S}ctuk,uk^  satisfying 


|DKSk[uk,uk]-(3Sk(uk))K|j1f1  -  0  as  JuJJ-uJll,  -  0  (3.14) 
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such  that 


-V  *  r  V  *  ,  .  V  *. 

uk“uk  =  D  sktuk'uk](w  • 


U. 


Proof.  Setting  v,l  +  1  =  S,  (v-Sb-i  ,  0  i  x  £  k-1  ,  v°  =  ,  and 

l  1  +  1  i+1  i+1  i  K  K  i  K 


1+1  l  +  1  *l  +  1  *[*'*’’  '  K  K 

Vk  =(vk,1 . vk,i'vk,i+1'”"vk,Nk>  '  1-i=Nk  '  we  have 

Fk,i(lvk+1)'Fk/i(uk)  £  aFkfi(llvk+1'uk],(lvk+1-uk)  •  (3’1 

Hence,  choosing 

. 

1  +  1  it 

and  decomposing  Sk[vk  ,  ukI  according  to 

.  \+l  *  i+1  *  i+1  *  i+1  * 

Sk[vk  ,uk]  "  Dk[vk  ,uk]  -  Lk[vk  ,uk]  "  ®klvk  ,uk] 


in  its  diagonal,  subdiagonal  and  superdiagonal  part,  (3.15) 
follows  easily  from  (3.16)  with 


DVuiX] 


-  3o(Dk[vk'i'<]-Lktvri'uki>"lRktvk'i'u^ 


while  (3.14)  can  be  deduced  from  (3.12),  observing  that  by 
Theorem  3.1  1v^  -+  u*  ,  1  S  i  <  Nk  ,  0  S  i  S  k  ,  as  u£  -+  u£  . 


Since  convergence  of  the  multi-grid  iteration  can  be  deduced 
from  that  of  the  corresponding  two-grid  process,  we  will  first 
consider  the  case  of  two  grids  and 

Moreover,  for  simplicity  we  take  =  k  >  0  ,  <2  =  0  and  we 

assume  that  the  correction  inclusion  on  level  1-1  is  solved 
exactly.  The  following  result  gives  an  explicit  representation 
of  the  two-grid  iteration  operator: 


LEMMA  3.5.  Let  u^ 
two-grid  algorithm 
and  exact  solution 
Then  there  holds 


,  v  i  1  ,  be  the  iterates  obtained  by  the 

MGSTEF2  (1  ,u^  ,b^)  with  data  k^=k>0  ,  k2  =  0 

of  the  correction  inclusion  on  level  1-1 
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Ul+1-Ul  =  (Mi_1 +zi> 


(3.17) 


where 


M 


1-1 


( (3F1(u*) ) 


-1 


-Pl-1(3Fl-1(rl 

•  3F1 (u£) (3S1 (u*) ) K 


1-1  * 


u£)) 


■1  1-1 


(3.18) 


and 


Zlh,1  s  C(kK 


(3.19) 


with  C(k)  >  0  and  •+•  0  for  ||u^-u^||  1  -*■  0 
Proof.  In  view  of  (2. 22a) , (2. 23)  and  (3.1)  we  have 


Fi-i(ri  V  =  ri  ui  “  3$' {ri  (-A^+bjn 
=  r^-1 (u^-3$* (-A1u^+b1) )  = 


‘  rl"lFl(^) 


and  thus 

DF1_1  [u1_1  ,r^  =  (3.20) 

=  F1_1(u1_1)  -  Fi_i =  ri_1 (F1(u£)-F1(u£) )  = 

=  -r*  1DF1Iu£,u£] (u£-u£) 

where  DF^[u£,u1]  €  co  SF^tlu^u^])  and  DF]_-i  ^ui-i  ' ri  6 

co  3F’1_^  ([u1_^  , r^  ^u^])  .  Using  (3.20)  in  (2.24)  and  taking  ad¬ 

vantage  of  (3.15)  yields 

ul+1-ul  =  n3FL(u*)  {I1+X1))~'i  -  (3.21) 

pl-1 (8F1-1 (rl~1ul) (I1-1+X1-1} )_1rl_1 ] * 

•[SF^u*)  (I1+X1)  (  (3S1(u*))K+Y1)  ]  (uj-u*) 


where 
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(3F1-1 (rl"1ul) )_1dF1-1 Cu1-1 
OF1(u*)  )"1DF1Iu^/u*j  -  I1  , 

DKS1[u^,uJ]  -  (3S1(u*) ) K  . 


•1-1 


If  we  prove 


|x1_1||1/1  s  c(K)nv  ,  ||x1||1J  <  c«)nv  ,  I|y1S1^1  -  c(K)nv  ,  (3.22) 

the  assertions  (3.17),  (3.18)  and  (3.19)  can  be  easily  deduced 
form  (3.21)  . 

Now,  the  third  inequality  in  (3.22)  follows  directly  from  (3.14) 

while  the  second  one  is  a  consequence  of  (3.13)  observing  that 

u,  as  u^  -*  u.  .  As  far  as  the  first  inequality  is  concerned 
x  x  -1-  * 
again  by  (3.13)  it  only  remains  to  be  shown  that  u^_^-r^  u^ -»■  0 

as  ui  •  For  this  purpose  we  remark  that  r£  u^  is  the 

unique  solution  of  F.  . (v.  . )  = 0  where  F.  .  is  defined  by 

(3.1)  with  h  =  h1_1  and  bh  given  by  =  AJ_.]r£~  u^r^  (A^-b^ 

Then,  applying  Lemma  3.3  and  using  H\l!0  -j  -C  ,  1-1  SkSl  ,  as 

well  as  ||rf  _^C  ,  p  =  0,1  ,  gives 

!iui-rri’1ui'h  * c  cl  n  bi-Eiii  o  * 

“  c  CL  II  ^1-Ull!  1  s  c  CL  I  uj-u*l  1  . 

It  follows  from  the  representation  (3.18)  of  the  two-grid 
1-1 

iteration  operator  that  convergence  can  be  expected  if 

there  exists  some  a  >  0  such  that  the  smoothness  property 

II  3F1  (u^)  OS1(u*))K||1f1  <  Co(K)h^a  ,  O^K^^Ihj)  (3.23) 

and  the  approximation  property 

!|3F1(u*))"1-pJ_1OF1_1(rJ"1u*))"1rJ'1!|1f1  <  C  h“  (3.24) 

hold  true  where  C  (k)  -*•  0  as  k and  ic.fh)  -*•  °°  as  h  ■+■  0 

o  max 

In  particular,  sufficient  conditions  for  the  approximation 
property  (3.24)  are  given  by  (cf.  [7]) 
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C_1  II vl_1  II  -J  -  11  Pl-1  vl-1 1 1  -  clvl_1l 

1  ,  v1_1£RN1~1, 

(3.25 a) 

»ri'1Ip,p  *  C  ,  p  =  0,2  , 

(3.25b) 

11  xi~pi-i ri  1  ^  2 , i  “  chi-i 

(3.25c) 

ll3Fl<V«1,0  s  c  ' 

(3.26a) 

n^vv^x.p+l  s  c  '  P'0'1  ' 

1-1  <  k  <  1  , 

(3.26b) 

1-1  ★  1  1-1* 
rl  3Fl(ullpl-1  ■  3Fl-1(rl  "l1  * 

61-1  ' 

(3.26c) 

l6l-,i2,0  S  ch?-1  • 

(3. 26d) 

For  standard  discretizations  of  second  order  elliptic  diffe¬ 
rential  operators  on  a  bounded  domain  ft  with  Lipschitzian  boun¬ 
dary,  Hackbusch  in  [7]  has  verified  both  the  smoothness  property 

and  the  conditions  implying  the  approximation  property  for  pro- 
1  1-1 

longations  P1_1  and  restrictions  based  on  bilinear  in¬ 

terpolation  resp.  full  weighted  restriction.  Moreover,  he  has 
shown  that  certain  perturbations  of  P^_i  resp.  r^  1  and  (5^_1 
of  order  O(h^)  resp.  CMh^.j)  are  allowed  at  points  in  a 
0(h^)-  resp.  0  (h^_^ ) -vicinity  of  the  boundary  r  =  9ft 
These  results  can  be  applied  to  the  present  situation,  if  we  im¬ 
pose  the  following  requirements: 

The  projections  ivL1(*,tII1)  of  the  interfaces  (3.27) 

Z;L(*,tm)  ,  i=1,2  ,  O^m^M  ,  into  the  ft  plane 

admit  Lipschitzian  parametrizations. 

The  discrete  interfaces  Ik(uk)  ,  OSkSl  ,  (3.28) 

satisfy 

max  dist(xn,iTEi  ( •  ,tm) )  =  0(hk)  (hk^0)  . 

Vzk<V 

Basically,  (3.27)  is  a  regularity  condition  which  depends  on  the 
regularity  of  the  solution  resp.  the  data  of  the  Stefan  problem 
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(cf.  e.g.  [10]).  Concerning  assumption  (3.28),  with  regard  to 
related  results  by  Brezzi  and  Caffarelli  [2]  for  obstacle 
problems,  one  can  expect  0(hk)  -convergence  of  the  discrete 
interfaces  if  the  discrete  solution  converges  to  the  continuous 
one  in  the  L°°-norm  of  order  0(hk)  >  <*2  1 

Now,  setting 


vk  -  (vkep 


Nk 


vk,i=0,  xiezk(uk)J  '  Hsksl  « 


(3.29) 


Ni 


it  is  easily  seen  that  » 

Pl-1Vl-i  -Vl  '  an<^  that  Vk  is  invariant  under 


vx  €  P 


1-1  SkS  1  .  Consequently 

is  sufficient  to  verify  (3.23)  and  (3.24)  resp 


Mi~1vl  6  V1 


v1en 


Nl 


ri  ’visvi-i- 

8Fk(u*l  , 


and  hence ,  it 
(3.25) , (3.26)  for 
the  corresponding  operators  restricted  to  the  subspace  resp. 

V1_1  .  Since  3Fk#  L  <V  =  <Ik  +  a"1  V  ±  >  1  S  i  <  NR  ,  1-ISkSl  f 

v€{1,2}  ,  where  Ak  is  the  standard  five-point  approximation  to 

-AtA  on  ^k  ,  due  to  ( 3 . 27) ,  (3 . 28)  the  smoothness  property 
(3.23)  and  conditions  ( 3 . 25) ,  ( 3 . 26)  implying  the  approximation 
property  (3.24)  are  easily  \'erified  in  view  of  Hackbusch’s 
results. 

Using  ( 3. 23)  ,  (3 . 24)  in  (3.18)  implies  ||MZ  1 1|  ^  ^<CC(<)  and 
hence.  Lemma  3.5  immediately  gives  convergence  of  the  two-grid 
iteration.  Since  in  case  of  more  than  two  grids  the  iteration 
operator  can  be  recursively  defined  by  means  of  the  corresponding 

k- 1 

two-grid  operators  +Zk  on  levels  1  S  k  S  1  ,  taking  advan¬ 

tage  of  the  fact  that,  under  assumptions  (3. 27) , (3. 28) ,  both 
(3.23)  and  (3.24)  hold  true  on  all  levels,  convergence  of  the 
multi-grid  algorithm  MGSTEF2 (1 ,u^ , b^)  can  be  established  without 
difficulty  (cf.  e.g.  [8]): 


THEOREM  3.6.  Let  u^  ,  v  2  0  ,  be  the  iterates  obtained  by 

MGSTEF2  (1  ,u^  ,b^)  in  case  of  1  +  1  grids  &k  ,  OSkSl  ,  with 

data  k^=k>0,  <2  =  0>  "Yk  =  2  ,  1-k-l  ,  and  exact  solution 

of  the  correction  inclusion  on  the  coarsest  grid.  Then,  under 

assumptions  ( 3 . 1 0) , ( 3 . 27)  and  (3.28)  there  exists  k  .  21  such 

mm 

that  for  all  Kmin  ^  K  "  icmax^l^  there  holds 

K+1-Uili  <  [CC(tc)  +  C(K)nvl|u1-u*|  1  . 


(3.30) 
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Related  convergence  results  for  different  data  (e.g.  smoo¬ 
thing  after  the  correction  step  and  approximate  solution  of  the 
correction  inclusion  on  level  k  =  0  )  as  well  as  convergence  of 

the  nested  iteration  NISTEF2 (1 , u^+ 1 ,u®,b™+ 1 )  can  be  obtained  by 
standard  means  and  will  therefore  be  omitted  (the  reader  is  re¬ 
ferred  to  e.g.  [8]). 

4.  NUMERICAL  RESULTS1 

As  an  example  we  have  considered  the  following  problem  taken  from 
(3]  which  admits  an  analytical  solution: 

The  physical  data  are 

C1  =  2  '  c2  =  6  '  k1  =  1  '  k2  =  2  '  s  =  1  ' 
the  spatial  domain  and  the  time  interval  are 

n  =  (0, 1 )  X  (0 , 1 )  ,  <T0,T.,)  =  (0,-|)  , 

and  the  source/ sink  term  is  given  by 

f  (x,y,t)  =  4k±-ci  exp(-4t)  ,  (x,y,t)  £  Q1  ,  i=1,2  . 

Then,  the  explicit  solution  of  the  corresponding  two-phase  Stefan 
problem  is 

9  (x,y,t)  =  (x  -  j)  2  +  (y  -  2  -  exp  ( —  4 1 )  /  4  ,  (x,y,t)  €Q  . 

Taking  the  initial  and  boundary  conditions  from  the  exact  solu¬ 
tion,  numerical  solutions  have  been  computed  by  Elliott's  single¬ 
grid  SOR  algorithm  [6]  and  by  the  multi-grid  algorithms  MGSTEF1 
and  MGSTEF2  With  respect  to  various  time-steps  At  and  grid 
hierarchies  (^fc>k=o  with  h^  =  2  ^k+1^  ,  0  k  S  1  .In  both 


"'aII  computations  reported  in  this  section  have  been  performed 
on  the  CRAY  X-MP/24  at  Konrad-Zuse-Zentrum  fur  Informations- 
technik  Berlin. 
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multi-grid  procedures,  the  corresponding  nested  iteration  schemes 
NISTEF1  and  NISTEF2  with  =  1  ,  0  S  k  <  1-1  ,  have  been  used 

for  the  computation  of  a  suitable  startiterate  on  the  highest 
level . 

Figures  1  (i)-(iv)  illustrate  the  numerical  temperature  pat¬ 
tern  at  times  (i)  t=0.125,  (ii)  t=0.250,  (iii)  t=0.375  and 
(iv)  t=0.500,  where  positive/negative  and  zero  temperature  values 
at  the  grid-points  on  level  1  =  5  (h^  =  1/64)  are  marked  by  a 

dot/blank  and  "0",  respectively.  Figure  2  shows  the  exact  tempe¬ 
rature  history  (solid  line)  and  the  numerical  temperature  history 
(marked  by  "0"  )  at  the  point  x  =  21/64  ,  y  =  1/4  where  a 
change  of  phase  occurs  close  to  t=  0.25  .  Note  that  the  numeri¬ 
cally  calculated  temperature  is  slightly  larger  (smaller)  than 
the  exact  temperature  for  0<O  ( 0  >  0 )  ,  i.e.  there  is  a  delay 

when  passing  through  the  phase  change  temperature.  The  results  in 
Figures  1,2  are  based  on  computations  carried  out  by  MGSTEF2 
with  At  =  0.0125  ,  1  =  5  <h1  =  1/64)  and  =  1  ,  0<k£l  , 

k.  =  1  ,  1  £  i £  3  .At  each  time-step  the  multi-grid  iterations 

2  y 

have  been  stopped  when  the  discrete  L  -norm  ||A.  (t  )  || .  ,  of  the 
v  v  v—  1  i  in  x  ,  z 

difference  A.  (t  )  =  0,  (t  )  -  0.  (t  )  ,  v  2  1  ,  of  two  subsequent 

l  m  l  m  l  m 

iterates  was  less  than  e.^  =  10”® 

Using  the  same  accuracy  bound  e,  ,  at  each  time-step  we 

2  ^ 

have  computed  the  discrete  L  -error  e.  (t  )=|jO.(t  )-0  .  (t  )|L 

x  f  At  m  x  iu  exact,  m  x  f  z 

Figure  3  (i)  shows  the  average  L^-error 
M 

e,  .  .  =  I  e.  . ,  ( t  )  /M 
1 , At  1 , At  m 

for  1  =  5  (h-^  =  1/64)  in  dependence  on  At  ,  whereas  Figure  3 
(ii)  represents  the  dependence  of  e-^  on  1  for  fixed  At  = 

0.0125  .  The  results  indicate  linear  convergence  both  with 

respect  to  the  time  increment  At  and  the  spatial  step-size  h^. 

Further,  we  have  compared  the  performance  of  Elliott's 
single-grid  SOR  algorithm  with  the  multi-grid  algorithms  MGSTEF1 

and  MGSTEF2  in  terms  of  asymptotic  convergence  factors  which  have 

\)  * 

been  computed  as  follows:  Denoting  by  0/  the  iterate  at  which 

—  8 

the  accuracy  bound  e ^  =  10  is  reached,  we  determine 
q1^)  =  (l A1*I lf2  /  I A1.S  1,2) **(1/C (v*-i) *Nwu3) 
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where  N^  is  the  number  of  work  units  used  for  one  iteration 
step.  Since  at  each  step  of  Elliott's  single-grid  algorithm  we 
have  performed  two  SOR  iterations  (the  first  with  respect  to  a 
lexicographic  ordering  of  grid-points  from  south-west  to  north- 
-east  and  the  second  one  in  reverse  order) ,  here  a  work  unit  means 
two  SOR-  resp.  Gauss-Seidel  iterations  on  the  highest  level  1 
Figure  4  (i)  gives  the  corresponding  results  for  t= 0.0125  and 
1  =  5  (h^  =  1/64)  where  the  values  for  Elliott's  single-grid  SOR 
algorithm  with  u>  =  1.7  are  marked  by  "  +  "  ,  for  MGSTEF1  in  case 

of  underrelaxation  with  m  =  0.85  by  "0"  and  for  MGSTEF2  by 
"A"  .  The  results,  reflecting  the  superiority  of  the  multi-grid 

schemes,  indicate  a  somewhat  oscillatory  behavior  of  MGSTEF1 
which  is  underlined  by  Figure  4  (ii)  where  the  max/min  and  ave¬ 
rage  convergence  factors  q^  =  max  q^(t  )  ,  q1 .  =  min  q^(t  ) 

m  .  1  <mSM  1  SmSM 

and  qav =  1  q  (tm)/M  are  shown  in  dependence  on  the  (under-) re- 

taxation  parameter  w 

Although  we  don't  have  a  rigorous  theoretical  explanation  for  the 
need  of  underrelaxation  in  MGSTEF1,  it  seems  that  in  change  of 
phase  regions  the  defect  correction  process  in  MGSTEF1  does  pro¬ 
duce  "large"  errors  which  have  to  be  damped  appropriately  by 
underrelaxation. 
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Abstract 

An  experimental  study  of  the  applicability  of  a  multigrid  algorithm  to  the  solution 
of  the  neutron  transport  problem  in  a  slab  is  described.  Only  the  simplest  choices 
are  made  for  the  components  of  the  algorithm.  Experimental  results  indicate  that  the 
coarse  grid  operator  obtained  from  finite  differences  works  better  and  is  cheaper  than 
the  Galerkin  choice. 


1  INTRODUCTION 

In  this  report  the  application  of  a  multigrid  algorithm  to  solving  the  neutron  transport 
in  a  slab  problem  is  discussed.  The  goal  of  this  experimental  study,  simply  put,  is  to 
observe  whether  such  an  algorithm  is  applicable  to  the  neutron  transport  problem  and  to 
compare  the  multigrid  algorithm  to  a  classical  algorithm,  in  this  case  damped  Jacobi.  It  is 
important  to  realize  that  only  relatively  simple  problems  were  chosen  for  testing.  In  this 
respect  this  report  only  deals  with  the  feasibility  of  the  algorithm. 

For  the  multigrid  implementation  only  the  simplest,  and  perhaps  most  naive,  choices 
are  made  for  the  various  components.  This  demonstrates  that  a  ‘quick  and  dirty’  im¬ 
plementation  is  feasible.  From  a  computational  standpoint  two  coarse  grid  operators  are 
compared,  the  usual  Galerkin  choice  and  the  operator  obtained  from  finite  differences. 
Surprisingly,  the  finite  difference  operator  works  better  and  is  cheaper  than  the  Galerkin 
choice. 

‘Supported  by  the  U.S.  Air  Force  Office  of  Scientific  Research  under  Contract  No.  AFOSR-82-0275. 
Additional  support  provided  by  the  Los  Alamos  National  Laboratory. 
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The  report  is  organized  as  follows:  Section  2  describes  the  particular  problem  that 
was  solved.  In  Section  3  the  discrete  problem  is  described  along  with  the  Jacobi  iterative 
scheme.  The  multigrid  scheme  is  detailed  in  Section  4  and  the  experimental  results  are 
discussed  in  Section  5.  Section  6  contains  some  concluding  remarks  and  Section  7  describes 
some  limitations  of  this  report.  The  derivation  in  Section  2  can  be  found  in  the  book  by 
Wing  [Wing62a]  while  the  books  by  Chandrasekhar  [Chan60a]  and  by  Lewis  and  Miller 
[Lewi84a]  provide  additional  insight  into  the  problem. 

2  THE  PROBLEM 

Let  the  slab  width  a  be  chosen,  then  the  goal  in  the  general  case  is  to  determine  the 
neutron  density,  ^(m;  z),  satisfying 

+  =  /_!  ^  *)  dfM  +  S  U) 

with  boundary  conditions 

V>(/i,0)  =  gi[n),  ft  >0 

til*,  a)  =  #*<0. 

Here  a{ i)  is  the  cross  section  of  the  material.  For  this  study  a(x)  is  taken  to  be  1,  but  in 
more  realistic  problems  o(x)  may  be  allowed  to  vary  throughout  the  material,  and  may  in 
fact  be  discontinuous. 

Define 

H-  =  "s + <’w, 

and 

L' =  r(x)  iydM- 

Then  (1)  is  equivalent  to 

H^rl i  =  'iLip  +  S 

or 

*  =  H^Lrf,  +  S,.  (2) 

Applying  L  to  both  sides  of  (2)  and  multiplying  by  qr  gives 

7  Lr/j  =  +  7LS1. 


i; 


% 

\  . 


Finally,  define 


Kamowitz 


301 


and  call 


K ■  =  lh;1-  = 


•  dp 


4>  =  '/LV'. 


Then  (1)  is  seen  to  be  equivalent  to 


<t>  =  *7 K<j>  +  Tf  LSi 


or 

(I  -  iK)*  =  'iLS,  =  St.  (3) 

It  is  important  to  note  that  i:  the  case  o(x)  =  1  that 

K<f>{x)  =  ^JQ  Ei[\x  -  y\ )<f>dy 

where  E2  is  the  exponential  integral 


fOO  g~,x 

e*{x)=L  ~^ds- 

The  method  of  solution  described  here,  however,  does  not  use  the  relationship  between  K 
and  Ex.  In  particular  this  allows  the  treatment  of  problems  where  a  ^  1. 

Once  <f>(x)  is  determined,  to  obtain  the  density  »/>(x;p)  note  that 

4>{x)  =  ^To(x)  J  V»(x;  n)dn  +  S2, 


so 

^x  +  =  ^(r)  +  S- 

For  later  use  define 

M  =  (/  -  7 K) 

and  problem  (3)  is  written 

M<f>  =  S2. 

It  is  problem  (3)  that  is  solved  in  the  succeeding  sections. 


(4) 

(5) 


3  THE  DISCRETE  PROBLEM 

The  first  step  in  computing  the  numerical  solution  of  (3)  is  to  discretize 

n  =  (0,a] 

into  IV  —  1  evenly  spaced  pieces  each  of  width 
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The  discrete  points  are  thus  the  points  x<  =  th  with  0  <  t  <  N.  A  solution  to  (3)  is  desired 
at  each  of  the  unknowns  0(x<),  which  are  denoted 
The  discrete  analogue  of  (5)  is  written 


Mh<j>  —  S2 


(6) 


where  Mfc  is  the  discrete  analogue  of  the  operator  M. 

For  both  the  Jacobi  scheme  and  the  multigrid  algorithm  it  is  necessary  to  approximate 
K<f>.  Recall  that 

*-!/_;  a-.*  (?) 

The  integral  in  (7)  is  approximated  by  the  six  point  Gauss-Legendre  rule 

/  /dx«  X>,/(x,). 

J  1  i=i 

In  addition,  the  equation 

s;lv  =  d> 

is  equivalent  to  solving 

+  v  =  *  (*) 

for  each  of  the  fij  used  in  the  Gauss-Legendre  rule.  Again,  the  boundary  conditions  are 


F(m,0)  =  Si(m),  for  /z  >  0 
V  (n,  a)  =  g2(n)  for  n  <  0. 

For  this  study  the  trapezoid  rule  is  used  to  solve  (8).  This  results  in,  for  example  for 
Hj  >  0,  evaluating 


/**+*  fl 

jx  V{fihx)  «  ~[<l>{x  +  h)  -V{x  +  h)  +  4>{x)  -  V(x)]. 
In  other  words,  for  /z,  >  0, 


Vi  ~  2^riiv<-1  +  ^ Th[<t>i + ^-l] 


with  V0  =  given.  Similarly  for  /z}-  <  0  the  integration  proceeds  backwards  starting 

at  x  =  a.  Given 
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VN+i  =  V{a\Hj)  =  92(a), 


Vi  =  '~L-1  ~-j-Vi+1  -  -  -- r[4>i  +  4>i+ 1]. 

2/!y  -  h  2fij  -  h 

Finally,  to  compute  K<j>,  simply  sum  the  V{  as  in  (7)  and  the  Gauss-Legendre  rule;  set 

(**].  =  |  !>;*• 

zy=i 

Of  course  during  the  computation  it  is  not  necessary  to  store  the  values  of  V,  —  just  form 
[K<t>\i  as  each  V,  is  calculated. 

3.1  The  Jacobi  Iterative  Scheme 

The  basic  iterative  scheme  considered  is  the  damped  Jacobi  scheme  with  parameter  w. 
Formally,  given  <t>°  set 

<t>v+i  =  r + r -(i-  ik)  n  (9) 

1  +  U/ 

For  u  —  0  this  is  simply  the  Jacobi  scheme  and  (9)  is  equivalent  to  computing  the  Neumman 
approximation 


4  THE  MULTIGRID  IMPLEMENTATION 

As  has  been  known  for  some  time  iterative  improvement  is  one  approach  to  obtaining  an 
improved  estimate  to 

Mh<j>  =  /  (10) 

given  an  estimate  <£*.  Formally,  computing  x  satisfying 

(/-7K)x  =  5j-(/-7K)^  (11) 

and  setting 

<t>k+l  =  <t>k  +  x 

solves  (10).  Unfortunately  the  problem  with  this  procedure  is  that  solving  (11)  is  as 
difficult  as  solving  the  original  problem. 

One  approach  to  exploiting  the  iterative  improvement  idea  is  to  solve  (11)  in  a  lower 
dimensional  space,  where  less  work  is  required  to  compute  x-  Choose 
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n2fc  =  = 

for  the  lower  dimensional  space  (called  fl2*).  In  order  to  communicate  (transfer  infor¬ 
mation)  between  grid  functions  S\  defined  on  fi*  and  grid  functions  S2j,  defined  on  n2/,, 
interpolation  (/2fc)  and  restriction  (/**)  operators  are  needed. 

For  the  interpolation  operator  piecewise  linear  interpolation  is  used,  and  for  the  re¬ 
striction  operator  take 

nh  = 

Then  given  </>k,  to  compute  <f>k+1  using  the  two  grid  multigrid  algorithm 

1.  Apply  m  applications  of  the  damped  Jacobi  scheme  to  4>k ,  store  the  result  in  4>k. 

2.  Restrict  the  residual  to  fi2/,:  Set 

Rtk  =  llk\S2  -  (/  - 


3.  Solve 

where  Mlft  is  defined  later. 


■WiKXifc  =  Rtk 


4.  Correct  <f>k  by  setting 


*i+1  =  4>k  +  IkhX7k. 


Of  coarse  the  same  multigrid  procedure  can  be  used  recursively  (starting  with  initial 
guess  0)  to  compute  the  solution  in  step  3;  this  is  called  a  true  multigrid  algorithm. 

Two  choices  for  M2*  are  considered.  In  the  multigrid  literature  the  usual  choice  is  to 
take 

M2*  =  MJh  =  I2hhMhIkh. 

This  choice  is  referred  to  as  the  Galerkin  choice.  Unfortunately,  unlike  in  the  usual  case 
where  M*  is  a  matrix,  it  is  impossible  to  form  Af2*  directly.  Rather  only  M2h  acting  on  a 
vector  can  be  computed.  For  the  true  multigrid  cases,  where  more  than  two  grids  are  used, 
the  Galerkin  choice  will  still  be  denoted  Milt,  with  the  understanding  that  the  subscript  is 
related  to  the  size  of  the  coarsest  grid. 

The  second  choice  for  Af2*  is  to  take  the  natural  finite  difference  analogue  of  Af*  on 
the  2h  grid  fi2*.  This  choice  of  M2*  is  denoted  M/*.  From  a  finite  difference  point  of  view 
this  is  the  natural  choice  for  M2h- 
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5  EXPERIMENTAL  RESULTS 

Both  the  multigrid  algorithm  of  Section  4  and  the  Jacobi  algorithm  of  Section  3.1  were 
implemented  and  tested.  The  gods  of  the  experiments  were  to  determine 

1.  How  well  the  simple  damped  Jacobi  scheme  worked  at  solving  problem  (3); 

2.  Whether  the  multigrid  algorithm  is  applicable  to  the  solution  of  the  transport  equa¬ 
tion; 

3.  How  does  the  multigrid  algorithm  compete  with  the  damped  Jacobi  algorithm  in 
terms  of  rate  of  convergence  and  work? 

4.  What  rate  of  convergence  is  obtained  from  the  multigrid  algorithm? 

5.  Is  there  any  practical  difference  between  using  M2A  and  M2A? 

5.1  General  Remarks  About  the  Experiments 

The  right  hand  side  was  computed  by  choosing  the  solution  <f>true  and  setting 

5j  =  (7  -  'lK)4>tru€- 


For  testing  purposes  4>true  was  chosen  to  be 

4>tru*(Xi)  =  ^  +  j- 

Unfortunately  from  the  standpoint  of  testing  the  algorithm  there  are  a  myriad  of  choices 
of  parameters.  Among  the  parameters  that  can  be  varied  are:  the  number  of  points  on  the 
finest  grid,  TV;  the  number  of  grid  layers,  g;  the  value  of  7;  the  slab  width  a;  and  the  value 
of  the  damped  Jacobi  parameter  w.  For  the  experiments  discussed  here,  TV  was  fixed  at 
129  and  7  was  set  to  .999.  All  the  experiments  were  run  on  a  CRAY  1  at  the  Los  Alamos 
National  Laboratory. 

To  determine  the  rate  of  convergence  the  program  was  allowed  to  run  until 

||r‘||«i  =  ||52  -(7-7^11,1 

was  less  than  .0005.  Then  the  final  observed  rate  of  convergence  is 

„  =  Ik^llf, 

P,lnal  |jr/*"a|-1||,t  ’ 
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Three  choices  for  the  slab  width,  a ,  were  tested;  a  =  1,  10  and  100.  The  damped 
Jacobi  parameter,  w,  was  set  to  run  from  .1  to  1.9  in  increments  of  .1.  For  the  M2h  case 
u>  ran  from  .1  to  .9.  So  far  no  heuristic  has  been  found  for  choosing  the  optimal  w.  In 
general  some  form  of  an  adaptive  procedure  might  be  used  to  find  the  optimal  u>.  For  these 
experiments  m,  the  number  of  smoothing  iterations,  was  taken  to  be  1. 

For  the  multigrid  implementation,  to  solve  the  coarse  grid  equation  directly,  the  same 
damped  Jacobi  iteration  that  appears  in  Section  3.1  was  allowed  to  run  until  it  conver- 
genced.  In  a  better  developed  implementation  the  solver  for  the  coarse  grid  equations  can 
be  optimized.  In  any  case,  to  compare  the  rates  of  convergence  of  the  algorithm  the  coarse 
grid  solver  should  be  viewed  as  a  ‘black  box.’ 

5.2  Numerical  Results 

The  figures  in  Section  9  display  the  observed  rate  of  convergence  for  the  various  experi¬ 
mental  runs  that  were  performed.  The  dotted  line  corresponds  to  a  =  1,  the  dashed  line 
corresponds  to  a  =  10  and  the  solid  line  corresponds  to  a  =  100.  Note  that  for  some 
graphs  the  a  —  1  and  a  =  10  results  overlap.  Two  situations  where  the  algorithm  diverged 
are  noted.  In  one  case  the  convergence  rate  tended  towards  one  and  eventually  became 
one.  This  phenomenon  happened  slowly  as  the  algorithm  proceeded.  At  other  times  the 
algorithm  diverged  dramatically,  with  a  residual  on  the  order  of  1020.  This  would  happen 
quickly,  usually  after  just  one  iteration.  These  cases  are  displayed  in  the  figures  by  plot¬ 
ting  the  observed  rate  as  1.05.  The  plots  labeled  ‘Galerkin  rate’  correspond  to  using  Mih- 
These  runs  are  very  expensive  and  unfortunately  only  the  runs  for  u>  less  than  one  could 
be  made  due  to  limitations  on  computer  resources. 

6  GENERAL  CONCLUSIONS 

A  number  of  general  conclusions  can  be  made  about  the  experiments.  The  first  is  that 
for  large  a,  a  —  100,  the  damped  Jacobi  algorithm  was  not  a  viable  solution  technique. 
The  algorithm  converged  too  slowly.  However,  even  this  naive  multigrid  implementation 
worked  well  for  this  particular  problem.  Even  when  7  grids  were  used  (1  point  on  the 
coarsest  grid)  the  algorithm  converged  (for  w  =  1.6). 

From  the  standpoint  of  applying  the  multigrid  algorithm  it  is  very  important  to  note 
that  using  worked  at  least  as  well  as  the  usual  choice  M2\.  Applying  M2h  on  the 
coarser  grids  is  necessary  at  each  stage  of  the  algorithm  and  is  expensive.  In  particular, 
computing  the  coarse  grid  correction  with  the  damped  Jacobi  algorithm  on  grid  3  (33 
unknowns),  means  that 
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_  rSh  r4h  j2h  jh  r2h  r4h 

-  I4hI2hIh  MhhhUhhh 

needs  to  be  applied. 

During  the  computation  of  the  coarse  grid  correction  a  limit  of  5,000  iterations  was 
placed  on  how  many  damped  Jacobi  iterations  were  performed.  For  some  of  the  runs 
this  limit  was  reached.  For  these  cases  the  ‘best’  estimate  obtained  so  far  was  used  for  the 
coarse  grid  correction.  This  did  not  seem  to  affect  the  rate  of  convergence  of  the  algorithm. 
Understandably  as  the  multigrid  algorithm  proceeded  fewer  damped  Jacobi  iterations  were 
required  to  compute  the  coarse  grid  correction.  The  reason  being  that  as  the  error  tends 
towards  zero,  the  initial  guess,  zero,  was  a  better  estimate  of  the  eventual  solution. 

For  the  easy  problem,  a  =  1,  it  appeared  that  1.0  was  the  optimal  choice  of  to.  Unfor¬ 
tunately  for  harder  problems,  a  =  100,  the  choice  of  the  optimal  value  for  to  appears  to  be 
related  to  the  number  of  grids  in  use. 

7  CAVEATS 

It  is  important  to  realize  that  the  work  discussed  here  is  only  a  preliminary  examination 
of  the  multigrid  approach  to  the  neutron  transport  problem.  Many  important,  practical 
cases  have  not  been  discussed. 

One  situation  that  requires  further  work  is  the  case  of  large  slab  widths,  a.  A  naive 
approach  is  to  simply  add  more  points  by  increasing  N.  Unfortunately  that  results  in  an 
increase  in  the  computational  work  without  a  noticeable  increase  in  the  accuracy  of  the 
solution.  Since  both  a  and  N  are  increasing  together,  the  mesh  spacing  h  (which  controls 
the  accuracy)  does  not  change.  In  this  case  the  problem  of  resolving  the  error  on  the  coarse 
grids  still  remains.  Also,  for  large  a  it  is  not  clear  from  the  results  in  this  report  how  well 
the  Jacobi  method  acts  as  a  smoother. 

Another  situation  that  requires  additional  research  is  the  case  of  material  disconti¬ 
nuities.  What  effect  will  there  be  on  the  algorithm  if  the  function  o(x)  is  not  constant 
throughout  [0,a]? 
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Figure  3:  Rate  using  Afj*,  3  grids 
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Figure  5:  Rate  using  Afj*,  5  grids 
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Figure  9:  Rate  using  M**,  3  grids 
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4  Grid  Galerkin  Rate  of  Convergence 


Figure  10:  Rate  using  MJk,  4  grids 
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An  efficient  multigrid  method  to  obtain  second-order  accurate  solutions  of  the  2D  steady  Euler  equations  is 
described  and  results  are  shown.  The  method  is  based  on  a  nonlinear  multigrid  (FAS-)  iteration  method 
and  on  a  defect  correction  principle.  Both  first-  and  second-order  accurate  finite  volume  upwind  discretiza¬ 
tions  are  considered.  In  the  second-order  discretization  a  limiter  is  used. 

The  method  does  not  require  any  tuning  of  parameters.  Flow  solutions  are  presented  for  a  channel  and  an 
airfoil.  The  solutions  show  good  resolution  of  all  flow  phenomena  and  are  obtained  at  tow  computational 
cost. 

1980  Mathematics  Subject  Classification:  35L65,  35L67,  65N30,  76G15,  76H05. 
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1.  Introduction 

The  Euler  equations  describe  compressible  in  viscid  gas  flows  with  rotation.  They  are  derived  by  con¬ 
sidering  the  laws  of  conservation  of  mass,  momentum  and  energy  for  an  inviscid  gas.  The  result  is  a 
nonlinear  hyperbolic  system  of  conservation  laws. 

To  obtain  numerical  solutions  of  the  steady  Euler  equations,  the  equations  are  discretized  by  a 
finite-volume  upwind  discretization  [9],  Both  first-  and  second-order  discretizations  are  obtained  by 
the  projection-evolution  approach  [13].  In  the  projection-stage  of  this  approach  the  discrete  values, 
located  in  the  volume  centers,  are  interpolated  to  yield  continuous  distributions  in  each  volume. 
First-order  accuracy  is  obtained  by  piecewise  constant  interpolation,  second-order  accuracy  by  piece- 
wise  linear  interpolation.  In  case  of  flows  with  discontinuities  (shock  waves  or  slip  lines),  the 
occurrence  of  spurious  non-monotonicity  (wiggles)  when  using  a  second-order  interpolation,  is 
suppressed  by  the  use  of  a  limiter  in  the  interpolation  formulae  [22],  In  this  paper  we  use  the  Van 
Albada  limiter  [1,20].  In  the  evolution-stage,  a  Riemann  problem  is  considered  for  the  computation 
of  the  flux  at  each  volume  wall.  To  approximately  solve  each  Riemann  problem  we  use  the  Osher 
scheme  [16]. 

To  obtain  solutions  of  the  system  of  first-order  discretized  equations,  the  nonlinear  multigrid 
(FAS-)  iteration  method  is  a  very  efficient  solution  method  [9,10].  To  improve  the  order  of  accuracy, 
we  make  use  of  a  Defect  Correction  (DeC-)  iteration  process  [2,5].  In  each  iteration  of  this  process, 
the  second-order  discretization  is  only  used  for  the  construction  of  an  appropriate  right-hand  side  for 
a  system  of  first-order  discretized  equations.  The  FAS-iteration  method  is  used  to  solve  this  system. 
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Two  test  problems  are  considered.  The  first  problem  is  a  supersonic  flow  in  a  channel  with  a  circu¬ 
lar  bump;  Minltt  =  \A  (flow  with  shock  generation,  reflection,  crossing  and  merging).  The  second 
problem  is  the  transonic  flow  around  the  NACAOO  12-airfoil  at  =0.85,  o=r  (flow  with  upper 
surface  shock,  lower  surface  shock  and  tail  slip  line). 

In  section  2  a  description  is  given  of  the  first-  and  second-order  discretizations,  and  in  section  3  the 
solution  method  is  described.  In  section  4  we  discuss  the  numerical  results,  and  in  section  S  some 
conclusions  are  listed. 


2.  Discretization 

Consider  on  an  open  domain  S2eR2  the  2D  steady  Euler  equations  in  conservation  form  and  without 
source  terms: 

im+Mn  =  o.  do 

ax  ay 

where  q—(p,pu,pv,E)T  is  the  state  vector  of  conservative  variables,  and  where 
/ (q)-(  pu,pu2  +p,puv,(E  +^)u)rand  g(q)=(pv,puv,pv2  +p,(E+p)v)T  are  the  flux  vectors.  The  primi¬ 
tive  variables  of  (2.1)  are  the  density  p,  the  velocity  components  u  and  v,  and  the  pressure  p.  For  a 
perfect  gas,  the  total  energy  per  unit  of  volume,  E,  is  related  to  the  primitive  variables  as 
E  =  p/(y  — 1)+  ‘/ip(u2  + v2)  where  y  is  the  ratio  of  specific  heats. 

To  allow  solutions  with  discontinuities  we  consider  the  Euler  equations  in  their  integral  form.  Then 
the  2D  steady  Euler  equations  read 

f  {cosQf  (q)+sin<l>g(q)}ds  =  0  ,  VG’CB,  (2.2) 


where  Cfl  is  an  arbitrary  subregion  of  Q,  90'  the  boundary  of  O',  and  (cos<)>,sin<|>)  the  outward 
unit  normal  on  90’.  A  straightforward  and  simple  discretization  of  (2.2)  is  obtained  by  subdividing  0 
into  disjunct  quadrilateral  subregions  0(J  (the  finite  volumes)  and  by  requiring  that 

/  (cos<t>f  (q)+ sin <l>g(q))ds  =  0  (2.3) 

A 

for  each  volume  0,7  separately.  We  restrict  ourselves  to  subdivisions  such  that  only  0lty±i  and  0/±i,; 
are  the  neighbouring  volumes  of  0,  y  . 

Using  the  rotational  invariance  of  the  Euler  equations: 

cos<t>f  (q)+ sin  <f>g(q)  =  T~\q,)f(T(4>)q),  (2.4) 


where  is  the  rotation  matrix 


m  = 


i 

0 

0 

0 


0  0  0 
cost/)  sirup  0 
-sin<t>  cos<f>  0  ’ 
0  0  1 


(2.5) 


(2.3)  becomes: 

fT~l(W(TWq)ds  =  0.  (2.6) 

A  numerical  approximation  of  this  formula  is  obtained  by 

Fi,j  ■—  f+n,j+  f,j + v,  ~f  -  H.j  ~f,j  -  a  =  0,  (2.7) 

with 

f  +  u.j  =  1  (<f>,  +  v,,j  )/a  ( T(4>,  +  u,j  )q[  +  n,j .  T(<p,  +  )  (2.8) 

and  similar  relations  for  f-uj  and  fj±^.  In  (2.8),  /l  +  Wj  is  the  length  of  the  volume  wall 
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a.  Geometry. 


Fig.  2.1:  Finite  volume  Qt  J. 


b.  State  vectors. 


aaj  +  V4j  =  Slw  nS21  +  u  and  (cos  <j> ,  +  h  j  ,  sin<f>1  +  w/)  is  the  outward  unit  normal  on  3J2,  +  (fig.  2.1a). 
Further,  fR  :U4  XR4— >R4  is  a  so-called  approximate  Riemann-solver  and  q\  +  and  are  state 

vectors  located  at  the  left  and  right  side  of  volume  wall  9S2,  +  ^  (fig.  2.1b).  The  flux  vector  fi+uj 
represents  the  transport  of  mass,  momentum  and  energy  per  unit  of  time,  across  9RI  +  t4j.  For  a  more 
detailed  discussion  of  (2.7)  and  (2.8)  we  refer  to  [9,19]. 

The  application  of  an  approximate  Riemann-solver  is  the  essential  part  of  the  evolution-stage, 
whereas  the  computation  of  the  states  q‘+»j  and  <f,  +  v,.j  is  the  essential  part  of  the  projection-stage 
[13].  As  the  name  suggests  an  approximate  Riemann-solver  is  used  to  obtain  an  approximate  solution 
of  the  Riemann  problem  [4,6],  Several  approximate  Riemann-solvers  exist  [12,16,18,21],  Here,  we  use 
Osher’s  Riemann-solver  because  of  its  consistent  treatment  of  boundary  conditions  and  its  continuous 
differentiability  [9,16,17],  For  details  about  an  efficient  implementation  of  Osher’s  approximate 
Riemann-solver  we  refer  to  [9], 

Depending  on  the  way  the  states  q\  +  and  q'i+a.j  are  computed,  the  discretization  (2.7)  is  first-  or 
second-order  accurate.  First-order  accuracy  is  obtained  by  taking 

?{+»j  =  30(1  (2  9) 

ti+b,/  ~  9>  +  l.y 

Second-order  accuracy  can  be  obtained  by  for  example  the  k- schemes  introduced  by  Van  Leer  [13]: 
tf+b.j  =  9i,y"*  4  (9i  +  i.y— 4  (9i,y—  a®4* 

1+K  \~K  (210) 

<fi  +  b,j  =  9i+l,y3  —  +  4  Ofi+lj  ~ Qi  +  lj)  » 


with  ic  €  [-1,1].  For  k  =  -1,  x  =  0,  it  =  1/3  and  «  =  1  we  find  respectively:  the  fully  one-sided 
upwind  scheme,  the  Fromm  scheme,  the  upwind  biased  scheme  (third-order  accurate  for  ID  prob¬ 
lems)  and  the  central  scheme.  A  disadvantage  of  these  x-schemes  is  that  near  discontinuities,  spurious 
non-monotonicity  (wiggles)  appears  [11].  A  way  to  avoid  this  is  by  using  a  limiter.  We  modify  the  x- 
yhemcs  by  introducing  a  limiter  such  that  the  schemes  become  monotone  and  remain  second-order 
accurate.  Let  q‘iHJ  and  <f,^j  be  the  fcth  component  (k  -  1,2, 3, 4)  of  qi+^j  and  We  rewrite 

(2.10)  as 


<t'{%  =  +  and 

<f,i%  =  <#>,.,  +  W'(VR?U,jWxj . 
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where 


(2.12) 

and  where  i^R— >R  is  defined  by 

MR)  =  +  ^R 

(2.13) 

If  we  replace  MR)  in  (2.1 1)  by  «//Jj™(/?),  where  ^Lim(/?)  is  defined  by 

*.“”(*>  -  -MR), 

(2.14) 

then  (2.1 1)  results  in  a  monotone  and  yet  second-order  accurate  scheme  [20].  The  function  :R— >R 
is  called  the  limiter.  The  choice  k  —  0  corresponds  with  the  Van  Albada  limiter  [1,20].  An  advantage 
of  the  Van  Albada  limiter  is  that  in  the  neighbourhood  of  discontinuities  the  scheme  resembles  the 
fully  one-sided  upwind  scheme,  which  is  a  natural  scheme  in  such  regions.  For  all  flow  solutions 
presented  in  this  paper  we  used  ^“(fl)  although  i^(/?)  seems  a  reasonable  choice  as  well. 

In  case  Q,j  is  a  boundary  volume,  so  that  for  example  3 is  part  of  the  domain  boundary,  no 
limiter  can  be  used  to  compute  rf+Hj  and  q'i-^j-  In  this  case  we  use  a  simple  linear  interpolation,  i.e. 

9/  +  ».y  =  9.,y  +  ^(9i.y-9i-i,y).and 
-  **.y  —  9i.y  —  WQi.j  ~  Ri  -  l.y )• 

(2.15) 

The  boundary  conditions,  together  with  the  state  u2j,  are  used 
computation  is  done  by  considering  the  Riemann  boundary  value 
3Gi  +  fiy  is  computed  by  (2.8). 

to  compute  the  state  qr,  +  «,y.  This 
problem  [9,17],  The  flux  /  +  at 

3.  Solution  method 

The  method  to  solve  the  system  of  nonlinear  discretized  equations 
For  readers  unfamiliar  with  multigrid  techniques  we  refer  to  [3,5). 
Let 

is  based  on  a  multigrid  technique. 

Fh((Jh)  =  rh. 

(3.1) 

and 

Fi(qk)  =  rh 

(3.2) 

be  first-  and  second-order  accurate  finite-volume  upwind  discretizations  of  the  2D  steady  Euler  equa¬ 
tions  with  source  term  r.  Hence,  (Fl(qh)\J  =  FiJ  is  defined  by  (2.7),  (2.8)  and  (2.9),  and 
( Fh(qh))i.j  =  Fij  is  defined  by  (2.7),  (2.8),  (2.11)’and  (2.12)  with  <M*)=t^,m(R)  (the  Van  Albada  lim¬ 
iter).  Although  in  general  r  —  0,  we  prefer  to  describe  the  solution  method  for  systems  with  an  arbi¬ 
trary  right-hand  side.  The  subscript  h  denotes  the  meshwidth.  To  apply  multigrid  we  construct  a 
nested  set  of  grids,  such  that  each  volume  in  a  grid  is  the  union  of  4  volumes  in  the  next  finer  grid,  in 
the  obvious  way.  Let  0*  with  hx  >h2>  •  •  •  >ht  =  h  be  a  sequence  of  such  nested  grids.  So  Q*,  and 
Q*,  are  respectively  the  coarsest  and  the  finest  grid. 

The  solution  method  for  (3.2)  can  be  divided  into  three  successive  stages.  The  first  stage  is  the  Full 
Multigrid  (FMG-)  method,  which  is  used  to  find  a  good  initial  approximation  of  (3.1).  The  second 
stage  is  a  nonlinear  multigrid  (FAS-)  iteration  method,  which  is  used  to  find  a  better  approximate 
solution  of  (3.1).  The  first  iterand  is  the  solution  obtained  by  the  FMG-method.  The  FAS-iteration 
method  is  a  very  efficient  solution  method  for  (3.1)  [9,10).  In  general,  for  a  single  FAS-iteration,  the 
reduction  factor  of  the  first-order  residual  lies  in  the  range  0. 1-0.5.  Therefore,  just  a  few  FAS- 
iterations  are  sufficient  to  drive  the  first-order  residual  to  machine-zero.  The  third  and  last  stage  is  a 
Defect  Correction  (DeC-)  iteration  process,  which  is  used  to  find  an  approximate  solution  of  (3.2). 
The  first  iterand  of  this  process  is  obtained  from  the  second  stage.  We  will  now  discuss  these  stages 
more  fully. 
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Stage  I :  The  Full  Multigrid  (FMG-)  method 
Let 

F{(qht)  =  'k  (3-3) 

be  the  first-order  discretization  on  £2* ,  i  =  1,2,...,/.  The  FMG-method  (or  nested  iteration)  starts  with 
a  crude  initial  estimate  of  ;  the  solution  on  the  coarsest  grid.  To  obtain  an  initial  estimate  on  the 
finer  grid  £2*m,  first  the  solution  on  the  next  coarser  grid  £2*  is  improved  by  a  single  FAS-iteration 
(stage  II).  Hereafter  this  improved  approximation  is  interpolated  to  the  finer  grid  Q*  These  steps 
are  repeated  until  the  highest  level  has  been  reached.  The  interpolation  used  to  obtain  the  first  guess 
on  a  next  finer  grid  is  a  bilinear  interpolation.  For  this  purpose  the  grid  £2*  is  subdivided  into  dis¬ 
junct  sets  of  2  X  2  volumes.  The  four  states  corresponding  with  each  set  are  interpolated  in  a  bilinear 
way,  and  since  each  volume  of  £2*  overlaps  2X2  finer  grid  volumes  of  £2*>( ,  4X4  new  states  are 
obtained  on  £2*iti . 


Stage  II:  The  nonlinear  multigrid  (FAS-)  iteration  method 

To  find  a  better  approximation  to  (3.1)  we  apply  the  FAS-iteration  method  on  the  finest  grid  (£2*  ). 

One  FAS-iteration  on  a  general  grid  £2*  is  recursively  defined  by  the  following  steps: 

(0)  Start  with  an  approximate  solution  of  qk: . 

(1)  Improve  qh  by  application  of  p  (pre-)  relaxation  iterations  to  F'h  (qh )  =  rh . 

(2)  Compute  the  defect  dh  :  =  rh  —Fj,(qh). 

(3)  Find  an  approximation  of  qh  t  on  the  next  coarser  grid  £2A  | .  Either  use  qh  ft  'qh,  where 

ft  1  is  a  restriction  operator,  or  use  the  last  obtained  approximation  qh  r 

(4)  Computer*  =  F^  i(qh  i)+ ft" dK  where  ft-'  is  another  restriction  operator. 

(5)  Approximate  the  solution  of  F\  t  (qk  i )  =  rh  by  o  FAS-  iterations  on  £2*  The  result  is  called 

qk  t .  (<r  =  1  results  in  a  V-cycle  and  o  =  2  in  a  W-cycle.) 

(6)  Correct  the  current  solution  by  <7* :  =  qh  +ft_,  (qh,  ,  —  ?*..,),  where  ft  is  a  prolongation 
operator. 

(7)  Improve  qh  by  application  of  q  (post-)  relaxation  iterations  to  F'h(qh)  =  rh  . 

The  steps  (2)  -  (6)  are  called  the  coarse-grid  correction.  These  steps  are  skipped  on  the  coarsest  grid. 


In  order  to  complete  the  description  of  a  FAS-iteration  we  have  to  discuss:  (i)  the  choice  of  the 
transfer  operators  ft  t ,  ft"  and  ft  ' ,  (ii)  the  relaxation  method,  and  (iii)  the  FAS-strategy,  i.e.  the 
numbers  p ,  q  and  a. 

(i)  Choice  of  the  operators: 

The  restriction  operators  ft~'  and  ft"  are  defined  by 


—  (ft,  :  —  4  {(?*,)i,2y  +  (?*,)2i-l,27+(?*,)2i,27-l  +(.%hi-\,2j-\  J.and 

(<4,-,)i,y  =  (ft,'dh,)i,j:  -  (dh,  2j  +  }h-\,2j+  (dh,  >2., 2y  - 1  +  (<4,  >2,  -  l,2j  -  1  • 

The  prolongation  operator  ft:  t  is  defined  by 


(ft, hi,2j~(ft,.,  >24  —  1,2 j-(ft,.,t}h,-,  hi,2j-l  ~(ft,.,<}h,.,  hi  - 1,2;  - 1 :  )i,y 


(3.4) 

(3.5) 

(3.6) 


Note  that  this  prolongation  is  different  from  the  bilinear  interpolation  used  in  FMG.  By  defining  the 
transfer  operators  in  this  way,  it  can  be  verified  that 


n. 


=  ft- 


F\,ft- 


(3.7) 


i.e.  the  first-order  coarse  grid  discretizations  of  the  steady  Euler  equations  are  Galerkin  approxima¬ 
tions  of  the  fine  grid  discretizations.  This  is  a  very  important  property  because  it  implies  that  the 
coarse  grid  correction  efficiently  reduces  the  smooth  component  in  the  residual. 

(ii)  The  relaxation  method: 

We  use  Collective  Symmetric  Gauss-Seidel  (CSGS-)  relaxation.  Collective  means  that  the  four  vari- 
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□  :  relaxation 

Fig.  3.1:  Complete  multigrid  solution  process  for  obtaining  a  first-order  accurate  solution  (5  levels). 


ables  corresponding  to  a  single  volume  are  relaxed  simultaneously.  At  each  volume  visited  we  solve 
the  four  nonlinear  equations  by  Newton’s  method  (local  linearization).  It  appears  that  a  single  New¬ 
ton  iteration  is  sufficient.  For  details  about  the  local  linearization  formulae  we  refer  to  [9]. 

(Hi)  The  FAS-strategy: 

We  use  a  fixed  strategy:  a=  1  and  p—q  —  1,  i.e.  we  use  V -cycles  with  one  pre-  and  one  post¬ 
relaxation. 

When  the  exact  solution  of  (3.1)  is  desired,  more  than  one  FAS-iteration  has  to  be  performed.  In 
fig.  3.1  we  give  an  illustration  of  the  complete  solution  process  for  (3.1).  It  is  supposed  that  there  are 
5  nested  grids  (1=5).  Between  two  succeeding  points  A,B  we  have  a  single  FAS-iteration  (V-cycle). 
Between  two  succeeding  points  B,A  in  the  FMG-stage,  we  have  the  bilinear  prolongation. 

Stage  III :  The  Defect  Correction  (DeC-)  iteration  method. 

For  an  introduction  to  the  defect  correction  approach  we  refer  to  [2,5].  We  approximate  the  solution 
of  (3.2)  with  the  DeC-iteration  process: 

EA(<tf  +  ,))  =  Ejt(#)  +  (rh -Fl(rf'))  >  »  =  0, 1,2,... ,  (3.8) 

where  qi°'  is  the  solution  obtained  in  stage  II  with  only  a  single  FAS-iteration.  It  is  clear  that  the 
fixed  point  of  this  iteration  process  is  the  solution  of  (3.2).  In  fact  it  is  not  really  necessary  to  iterate 
until  convergence.  For  smooth  solutions  a  single  DeC-iteration  is  sufficient  to  obtain  second-order 
accuracy  [7],  For  solutions  with  discontinuities  experience  shows  that  already  a  few  DeC-iterations 
significantly  improve  the  accuracy  of  the  solution  [11).  When  more  DeC-iterations  are  performed,  the 
iterand  rt)  does  not  always  converge  to  the  solution  of  (3.2)  (See  for  example  the  channel  flow  prob¬ 
lem  in  section  4.)  But  even  in  those  cases,  a  significant  improvement  of  the  accuracy  of  the  solution  is 
observed. 

For  each  DeC-iteration  we  have  to  solve  a  first-order  system  with  an  appropriate  right-hand  side.  It 
appeared  that  it  is  inefficient  to  solve  this  system  very  accurately.  Application  of  a  single  FAS- 
iteration  to  approximate  +  in  (3.8)  usually  is  the  most  efficient  strategy  [7,1 1). 


Fig.  3.2:  Complete  multigrid  solution  process  for  obtaining  a  second-order  accurate  solution  (5  levels). 
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In  fig.  3.2.  we  give  an  illustration  of  the  complete  process  for  the  approximate  solution  of  (3.2). 
Suppose  there  are  5  nested  grids  (/  =  5).  Between  two  succeeding  points  A,B  we  have  one  FAS- 
iteration  (V-cycle).  Between  two  succeeding  points  B,A  we  have  a  bilinear  interpolation  in  the  FMG- 
stage,  and  an  appropriate  right-hand  side  computation  in  the  DeC-stage. 

4.  Results 

To  show  that  the  method  is  feasible  for  a  good  and  efficient  computation  of  typical  Euler  flows,  we 
consider  two  standard  Euler  test  cases:  (i)  a  supersonic  flow  in  a  channel  with  a  circular  arc  bump, 
and  (ii)  a  transonic  flow  around  the  NACAOO  12-airfoil. 

The  channel: 

The  geometry  of  the  channel  and  the  grid  (96  X  32)  are  shown  in  fig.  4.1.  The  bump  has  a  thickness 
ratio  of  4%.  For  the  multigrid  algorithm  we  use  4  coarser  grids.  At  the  inflow  boundary  (x  =  —  1) 
the  Mach  number  has  been  prescribed:  Mm„  =  1 .4.  For  results  obtained  by  others,  we  refer  to 
[15,24]. 

For  this  problem  we  compare  the  first-order  solution  with  a  second-order  solution.  The  first-order 
solution  is  obtained  with  the  FAS-iteration  process.  Fig.  4.2a  shows  the  convergence  history  of  this 
process.  The  residual  is  computed  as  '2iJ\F[(qi£))\I  j  (L ( -norm).  A  second-order  accurate  solution  is 
obtained  with  the  DeC-iteration  process.  The  convergence  history  is  shown  in  fig.  4.2b.  Here,  the 
residual  is  computed  as  2,;|/'*(^"))|,J.  A  slow  convergence  behaviour  is  observed,  but,  as  mentioned 
before,  the  DeC-iteration  process  is  not  used  to  obtain  the  solution  of  Fl(qh)= 0,  but  to  improve  the 
accuracy  of  the  first-order  solution. 

Fig.  4.3a,b  show  the  iso-Mach  lines  cf  respectively  the  first-  and  second-order  solution.  In  both  solu¬ 
tions,  the  oblique  shock  generated  at  the  leading  and  trailing  edge  of  the  bump  is  clearly  visible.  In 
the  first-order  solution  the  shocks  are  severely  spread.  The  reflection  of  the  leading  edge  shock  at  the 
upper  wall  is  hardly  visible.  The  second-order  solution,  on  the  contrary,  shows  very  sharp  shocks. 
The  reflection  of  the  leading  edge  shock  at  the  upper  wall,  its  crossing  with  the  trading  edge  shock,  its 
further  reflection  at  the  lower  wall  and  finally  its  merging  with  the  trailing  edge  shock,  are  all  clearly 
visible. 

Fig.  4.4  shows  the  Mach  number  distributions  along  the  lower  surface  of  the  channel.  Downstream  of 
the  bump,  the  large  qualitative  difference  between  the  first-  and  second-order  solution  is  observed 
once  more. 

Finally,  fig.  4.5  shows  the  entropy  distribution  s/sMe,  —  1  with  s  =pp~y,  for  the  first-  and  second-order 
solution,  along  the  lower  channel  wall.  The  first-order  solution  shows  a  spurious  entropy  generation 
along  the  entire  bump.  The  second-order  solution  has  no  such  entropy  generation,  but  shows  some 
spurious  non-monotonicity.  The  latter  is  caused  by  the  fact  that  no  limiter  can  be  used  near  boun¬ 
daries  (see  2.15). 


Fig.  4.1:  96  X  32-grid  channel. 


b.  Second-order. 


Fig.  4.3:  Iso-Mach  lines. 
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a.  First-order. 


b.  Second-order. 

Fig.  4.4:  Mach  number  distributions  along  lower  channel  wall. 


The  NACA001 2-airfoil: 

As  standard  Euler  test  case  for  the  NACAOO  12-airfoil  we  consider:  M x  =0.85,  a—  1°.  (M x  denotes 
the  Mach  number  at  infinity  and  a  the  airfoil’s  angle  of  attack.)  We  compare  our  results  with  results 
from  [23],  We  use  a  128X32  O-type  grid  with  the  outer  boundary  at  an  approximate  distance  from 
the  airfoil  of  100  chord  lengths  (fig.  4.6).  Following  [8,11],  we  impose  unperturbed  flow  conditions  at 
the  outer  boundary,  although  we  do  not  overimpose.  For  the  subsonic  outer  boundary  we  impose  3 
conditions  at  the  inflow  part  of  that  boundary  (u  =  A/00cos«,  v  =  Mxsina ,  c  =  1),  and  1  condition  at 
the  outflow  part  (m  =  cosa).  We  perform  10  DeC-iterations  and  use  a  multigrid  algorithm  with 
(again)  4  coarser  grids. 

The  results  obtained  are  presented  in  fig.  4.7,  In  fig.  4.7a  and  4.7b  we  present  convergence  his¬ 
tories.  In  fig.  4.7a  the  residual  ratio  2,_y|F*(iyJr,)|,- y /2,,y |F^(ij)I0))|,,y  (Z^-norm)  is  plotted  versus  n;  the 
number  of  DeC-iterations.  In  fig.  4.7b  we  show  the  convergence  history  of  the  lift  and  drag  force  act¬ 
ing  on  the  airfoil.  (For  a  definition  of  lift,  drag  and  their  proper  scaling  we  refer  to  e.g.  [14].) 
Although  the  Z^-norm  of  the  residual  ratio  is  decreasing  rather  slowly,  fig.  4.7b  shows  that  a  practical 
convergence  of  the  lift  and  drag  has  been  obtained  after  ~7  DeC-iterations.  This  is  typical  for  DeC- 
processes  [11].  The  shaded  areas  in  fig.  4.7b  represent  the  values  of  lift  and  drag  as  presented  in  [23] 
by  7  other  investigators.  As  the  best  reference  results  from  [23]  we  selected  those  obtained  by  Schmidt 
&  Jameson.  For  the  lift  and  drag  they  find:  q  =0.3472,  cd  =0.0557,  whereas  we  find  (after  the  10th 
DeC-iteration):  q =0.3565,  cd  =0.0582. 

In  fig.  4.7c  we  show  a  contour  plot  of  the  Mach  number  distribution  and  make  a  comparison  with 
the  distribution  as  obtained  by  Schmidt  &  Jameson.  Both  distributions  show  a  good  (i.e.  a  sharp  and 
monotone)  capturing  of  the  two  shock  waves,  and  of  the  slip  line  leaving  the  airfoil’s  tail.  Concerning 
the  sharpness  of  the  discontinuities,  it  should  be  noticed  that  Schmidt  &  Jameson  used  a  320  X  64  (!) 
O-type  grid. 
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a.  First-order. 


b.  Second-order. 

Fig.  4.5:  Entropy  distributions  along  lower  channel  wall. 


In  fig.  4.7d  and  4.7e  we  show  a  contour  plot  of  our  pressure  and  entropy  distribution.  (No  refer¬ 
ence  results  are  available.)  The  pressure  distribution  clearly  shows  the  smoothness  of  the  pressure 
across  the  slip  line  (up  to  the  airfoil’s  tail).  The  Kutta-condition  is  satisfied  automatically.  The 
entropy  distribution  j/jx-  1  has  a  convection  of  spurious  entropy  generated  at  the  airfoil’s  nose  of 
0.003  only.  Even  more  clear  than  the  Mach  number  distribution,  the  entropy  distribution  shows  the 
good  capturing  of  all  three  discontinuities.  The  slight  spreading  of  the  slip  line  in  downstream  direc¬ 
tion  is  only  due  to  the  grid  enlargement  in  this  direction. 

In  [11]  it  is  shown  for  five  different  airfoil  flows  that  we  need  5  DeC-iterations  on  an  average  to 
drive  the  lift  to  within  Vi%  of  its  final  value.  (The  drag  appeared  to  converge  even  faster  in  most 
cases.)  On  the  single  pipe  Cyber  205  on  which  we  performed  our  computation,  for  a  1 28  X  32-grid,  5 
DeC-iterations  take  in  scalar  mode  ~  100  sec  (i.e.  ~5  msec  per  volume  and  per  iteration).  In  vector 
mode  it  takes  ~50  sec.  We  did  not  extensively  tune  our  code  for  use  on  vector  computers  since  the 
method  brings  with  it  some  severe  inhibiters  for  vectorization.  However,  for  large  scale  computations 
where  all  data  cannot  be  kept  in  core,  an  advantage  of  the  present  method  is  the  small  number  of 
iterations  required.  (For  most  Euler  codes  this  number  is  significantly  larger.)  If  all  data  cannot  be 
kept  in  core,  a  small  number  of  iterations  results  in  a  small  data  transport  load.  Since  lO- times 
rather  than  CPU-times  may  be  the  bottleneck  in  large  scale  computations  on  vector  computers,  we 
consider  this  feature  as  an  extra  advantage  of  the  present  method. 
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b.  In  detail. 


Fig.  4.6:  128  X  32-grid  NACA0012-airfoil. 
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5.  Conclusions 

For  the  computation  of  non-smooth  flows  with  the  steady  Euler  equations,  defect  correction  and  non¬ 
linear  multigrid  are  found  to  be  very  efficient  tools.  A  second-order  accurate  solution  is  obtained 
already  after  a  few  DeC-iterations.  For  each  DeC-iteration,  a  first-order  system  with  an  appropriate 
right-hand  side  has  to  be  solved  approximately.  This  is  done  by  a  FAS-iteration  method.  It  appears 
that  a  single  FAS-iteration  is  already  sufficient. 

The  scheme  used  is  a  second-order  Osher  upwind  scheme  supplied  with  the  Van  Albada  limiter. 
The  solutions  obtained  show  a  good  resolution  of  all  flow  phenomena  and  are  obtained  at  low  com¬ 
putational  cost. 

An  important  property  of  the  present  method  is  that  it  is  completely  parameter-free;  it  needs  no 
tuning  of  parameters. 
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INTRODUCTION  : 

A  multigrid  scheme  for  solving  the  2-D  steady  Euler  equations  is  presented.  The 
method  applies  to  arbitrary  finite  element  triangulations.  A  MUSCL-type 
second-order  accurate  upwind  spatial  approximation  is  used.  The  successive  levels 
are  derived  from  the  triangulation  by  agglomerating  the  control  volumes  of  an 
adhoc  dual  mesh.  A  generalized  finite-volume  formulation  is  employed  on  these 
coarser  levels.  A  full  approximation  storage  pseudo-unsteady  scheme  is  con¬ 
structed.  Applications  to  transonic  flow  calculations  are  presented. 


1.  SCOPE  OF  THE  PAPER 

Finite  element  schemes  applying  on  unstructured  triangulations  (2-D)  or 
tetrahedrizations  (3-D)  have  proved  to  be  a  convenient  tool  to  calculate  Euler 
flows  around  complete  geometries.  This  option  has  permitted  calculations  about 
complete  commercial  aircraft  configurations  [1].  It  has  also  permitted  hypersonic 
flow  predictions  around  a  space  vehicle  [2,13].  For  such  calculations,  efficiency 
is  an  important  issue  (among  others  such  as  accuracy,  not  discussed  here). 

An  efficient  and  robust  implicit  algorithm  has  been  introduced  for  calculations 
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with  unstructured  meshes  [3]  and  extended  to  2-D  and  3-D  upwind  approxima¬ 
tions  [4,13].  Important  gains  in  efficiency  are  obtained  but  in  this  approach,  even 
for  the  most  efficient  version,  core-memory  requirements  are  important. 

The  purpose  of  this  paper  is  to  study  the  explicit  multigrid  approach  applied  to 
unstructured  meshes. 

This  option  is  felt  as  an  important  way  to  reach  higher  efficiency  and  has  been 
previously  advocated  by  LOHNER  and  MORGAN  [5]  and  MAVRIPLIS  and 
JAMESON  [14].  These  authors  used  for  the  different  levels  a  sequence  of 
unnested  grids. 

In  this  paper  we  investigate  the  possibility  of  using  nested  levels. 

The  "nested"  option  implies  the  following  properties  : 

(a)  the  different  levels  or  grids  are  easily  derived,  and  this  results  in  com¬ 
puter  and  user  time  savings.  This  point  is  essential  since  the  extension  to  3-D  is 
the  ultimate  goal. 

(b)  grid-to-grid  transfers  are  straightforward  to  derive  (simple  and  cheap) 

In  this  paper  the  method  used  to  generate  the  different  grids  is  an  important 
point  for  the  distinction  between  the  various  approaches.  Two  families  of 
methods  can  be  considered  : 


Topological  methods 

Generally  they  are  based  on  refinement.  Starting  from  an  unstructured  coarse 
grid,  finer  grids  are  generated  by  element  division  either  over  the  whole  computa¬ 
tional  domain  or  locally  only,  after  a  posteriori  error  estimates.  We  refer  to 
[6,9,10,15,16]  for  studies  using  these  two  points  of  view. 


Algebraic  methods 

Starting  from  a  linear  system  derived  from  an  arbitrary  unstructured  fine  mesh 
formulation,  coarser  levels  are  generated  by  gathering  related  equations  (lines  of 
the  matrix)  ;  we  refer  to  [7]. 
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Methods  of  the  first  family  are  rather  efficient  but  do  not  directly  apply  to  the 
solution  of  the  Euler  equations  with  an  arbitrary  unstructured  fine  mesh. 
Methods  of  the  second  family  are  usually  applied  to  linear  systems  and  necessi¬ 
tate  the  construction  and  generally  also  the  storage  of  the  matrix. 

We  propose  a  new  topological  approach  with  the  following  features  : 

-  the  coarse  meshes  are  not  classical  FEM  triangulations  but  generalized  finite 
volume  partitions  ; 

-  the  spatial  approximation  is  derived  on  each  level  ; 

-  a  full  approximation  storage  (non  linear)  scheme  is  employed. 

In  Section  2  we  discuss  the  question  of  the  generation  of  the  coarse  levels  .  Sec¬ 
tion  3  presents  the  upwind  spatial  approximation.  The  ingredients  of  the  MG 
algorithm  are  described  in  Section  4.  The  efficiency  of  the  method  is  illustrated 
by  numerical  experiments  in  Section  5. 


2.  THE  GENERATION  OF  THE  DIFFERENT  LEVELS 

The  objective  is  to  generate  coarse  levels  automatically  from  an  arbitrary 
unstructured  triangulation. 

To  achieve  such  degree  of  reliability,  we  explore  the  possibility  of  grouping 
together  nodes  associated  with  contiguous  control  volumes.  Thus,  coarse  levels 
are  not  produced  by  a  new  triangulation  of  the  domain.  However,  identifying 
nodes  to  control  volumes  permits  a  homogeneous  description  of  the  different 
levels  in  terms  of  Finite  Volume  partitions. 

Finite  Volume  Dual  mesh  : 

Indeed,  it  has  been  observed  that  simplicial  (triangles, tetrahedra)  Galerkin 
approximations  are  equivalent  in  some  sense  to  adhoc  finite  volume  formulations 
on  specific  dual  meshes  :  for  the  two-dimensional  case,  the  dual  mesh  is  derived 
using  the  medians  of  the  triangles. 


Coarsening  agglomerating  algorithm  : 
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Grouping  together  control  volumes  results  in  a  new  (coarser)  mesh.  Repeating 
the  operation  allows  us  to  get  coarser  and  coarser  levels  until  sufficiently  many 
levels  are  obtained. 

Notice  that  the  process  is  not  limited  by  the  number  of  points  in  one  direction, 
but  only  by  the  overall  number  of  points. 

An  algorithm  for  grouping  cells  together  should  satisfy  the  following  criteria  : 

(1)  The  size  of  the  mesh  should  decrease  while  the  maximum  allowable 
time  step  (for  explicit  iterations)  should  increase. 

(2)  The  solution  should  be  sufficiently  accurately  represented  on  coarser 
grids  to  obtain  a  good  initialization  (Full-MG)  and  good  preconditioners  , 

(3)  The  sequence  of  nested  grids  should  allow  the  damping  of  a  dense 
enough  collection  of  frequency  modes, 

(4)  The  procedure  should  not  be  costly. 

One  approach  could  consist  of  using  some  auxiliary  regular  coarser  mesh  which 
divides  the  domain  in  regions,  in  order  to  gather  the  cells  whose  centers  belong  to 
the  same  region.  Such  an  approach  may  not  sufficiently  account  for  the  density 
of  the  initial  mesh  . 

Some  more  sophisticated  methods  could  be  considered  :  we  could  derive  them 
from  the  works  motivated  by  multi-tasking  in  super-computers  ;  the  problem  is 
to  divide  the  domain  in  regions  which  are  (1)  of  comparable  size  (number  of 
nodes)  and  (2)  with  as  few  connections  between  each  other  as  possible  (therefore 
with  as  much  connection  in  each  region).  The  sophistication  can  be  increased  up 
to  the  examination  of  the  discrete  equation  as  in  Algebraic  MG  methods. 

The  study  of  these  possibilities  is  in  progress  ;  in  this  paper,  some  experiments 
will  be  presented  with  coarser  meshes  generated  with  a  trivial  (and  very  cheap) 
algorithm  :  in  only  one  double  'do-loop',  we  consider  successively  each  cell  ;  if 
the  cell  has  not  yet  been  grouped  with  other  cells  to  form  a  zone  of  the  coarser 
mesh,  then  a  new  zone  is  formed  that  contains  this  cell  and  all  its  direct  neigh¬ 
bours  (ie  those  sharing  a  common  boundary)  that  are  not  yet  included  in  another 
zone. 

One  obvious  disadvantage  of  this  method  is  that  coarse  levels  can  destroy  the 
possible  symmetry  properties  of  the  fine  mesh. 
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3.  GENERALIZED  SPATIAL  FINITE  VOLUME  UPWIND 
SCHEMES 

One  main  feature  of  the  algorithm  is  that  it  relies  on  the  construction  of  a 
Finite-Volume  Method  applicable  to  an  arbitrary  partition  of  the  computational 
domain.  In  this  paper,  this  construction  is  detailed  in  the  case  of  a  first-order 
accurate  upwind  scheme.  In  a  forthcoming  paper,  second-order  extensions  based 
on  either  central  or  upwind  approximations  are  derived  using  a  similar  construc¬ 
tion. 


3.1.  First-order  scheme 

The  time-dependent  Euler  equations  are  written  in  conservative  form  : 

Wt  +  F(W)Z  +  G(W),  =  0, 

in  which  as  usual  : 

W  =  ( p,pu,pv,E ) 

where  p  is  the  density,  (u,v)  the  velocity  and  E  the  total  energy  per  unit 
volume. 


The  upwind  finite  volume  scheme  is  derived  in  the  simplest  manner  that  one  can 
imagine.  We  describe  it  in  the  context  of  the  usual  explicit  time-stepping. 

Given  a  cell  C,  ,  the  mean  value  Wi  of  the  dependent  variable  in  this  cell  is 
advanced  from  time  level  n  to  time  level  n+1  as  follows  : 

orea(C,)[  W?+l  -  W?]  =  -At  £  *(*?•*?***) 

j  neighbor  of  i 

where  *j,;  is  the  following  metric  vector  : 

nl3  =  f  vt  d<r 

ni1  =  f  v v  do 

ac<r\ac, 

V  =  (vt>vy)  normal  vector  pointing  outward 
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and  where  $  is  a  flux-splitting  consistent  with  r)x  F  +  t]t  G  . 

In  this  paper,  the  following  splitting  is  used  : 

•W.Wiri*)  = 

«  w.+  w. 

+  j  \p(-ly~1-) I  (W',  -  »V) 

with  the  notation 

pm  =  i  v%m  +  tfjfo*') 


3.2.  Stability 

The  efficiency,  and  to  some  extent  the  robustness  of  the  algorithm  relies  on  the 
accurate  estimation  of  the  maximal  time-step  ;  this  is  particularly  essential  when 
local  time-stepping  is  employed. 

Unfortunately,  estimating  the  local  time-step  evaluated  from  a  simplified  Fourier 
analysis  can  be  very  hazardous.  Hence,  we  prefer  to  evaluate  a  lower  bound  based 
on  the  L°°  stability  of  a  two-dimensional  model. 

Then  two  models  can  be  useful  to  the  study  : 

1  -  Constant-velocity  advection  : 

It  is  written  : 

iq  4-  =  0  *n  R 2  with  V  e  R2 

A  standard  upwind  discretization  is  the  following  : 

area(Cj)  (u,"+1  -  u?)  =  -  At  £  oi}  (O*  +  (1-0,,)  «") 

j  neighbor  of  » 


a„  =  » rJ  V*  +  r,i>  V * 

°ij  =  y  («Vn(a0  ) -M) 


with 
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and  where  is  defined  below. 


LEMMA  X  :  The  above  2-D  adveetion  scheme  satisfies  the  Maximum  Principle  if, 
for  every  cell  C{ ,  the  following  inequality  holds  : 

a rea(C,)  -  At  f  V.V  do  >  0  . 
ac+ 

where  5C,+  denotes  the  part  of  dC{  where  V.V  is  positive. 

2  -  Non-constant  velocity  : 

This  case  has  to  be  studied  in  a  conservative  formulation  : 

Uf  +  div(V  u)  =  0. 


where  V  is  given  but  not  constant,  V=V(x,y). 

It  is  reasonable  to  consider  that  a  numerical  scheme  which  approximates  this  con¬ 
servation  law  is  stable  if  it  preserves  the  positiveness  of  the  dependent  variable. 


The  conservative  scheme  is  derived  : 

area(C.)  (u/l+1  -  u?)  =  -  At  £  <*,,  (0t>  u?  +  (1-*#)  «") 

j  neighbor  of  i 

with 


V*  +  V*  V*  + 

(-y1)  +  *?{— Y  ') 

9ij  =  ~  +  1) 


LEMMA  2  :  The  above  scheme  preserves  the  positiveness  if  the  following  inequal¬ 
ity  holds  for  every  cell  Cj-  :  > 

area(Ci)  -  At  ,  Max  \  \  \  \  f  do  >  0. 

j  neighbor  of  i  gc 
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Application  to  the  Euler  scheme  : 


The  following  "  reference  time-step  '  will  be  computed  on  each  cell  : 

At{  =  area(Ci)  /  (  A^  f  da  ) 

act 

with 

*ifaz  =  Max{  A,  ,  Max  A  ) 

3  neighbor  of  t 

A,-  =  (u,2  +  v,2) 2  -I-  c,- 

where  «,,v,,c,  hold  for  the  values  in  cell  C,  of  (resp.)  horizontal  velocity,  verti¬ 
cal  velocity,  and  sound  speed. 


In  practice,  time-step  larger  than  At,-  by  a  factor  of  3  can  be  used  (L2-)stably  and 
a  good  strategy  for  multi-gridding  is  to  set  At  in  the  range  of  2.5  At,-  to  3.  At,-. 

3.3.  Second-order  spatial  scheme  (fine  level) 

We  present  in  this  paper  a  second-order  spatial  scheme  that  is  only  applied  on 
the  finest  grid  level  defined  by  a  triangulation. 

The  scheme  uses  some  ideas  of  the  MUSCL  approach  and  is  introduced  in  [4,13]  ; 
we  recall  it  briefly  : 


an  approximate  gradient  ( Wx ,  )  of  W  at  each  vertex  i  is  derived  from  the 

Galerkin  linear  interpolation  of  W  over  all  the  triangles  having  t  as  a  vertex  : 


(*M0,  %(<)) 

r  dW 


Wz(i)  =  Sui,1,U  dx 


dx  dy 


f  dx  dy 
Supp(<) 


= 


f  — 

Supp(i)  dy 


dx  dy 


f  dx  dy 
Supp{i) 


-V 
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where  the  notation  Supp(i)  denotes  the  union  of  triangles  having  t  as  a  vertex. 

Then  for  each  pair  of  neighboring  vertices  (i,j)  we  compute  the  following  extra¬ 
polated  values  : 

=  w{  +  ^w{i).Vj 
Wji  =  wj  +  ±Vw{j).Tx 

These  values  can  be  replaced,  following  the  MUSCL  process,  by  limited  values 
(  ,  Wjj )  ;  we  refer  to  [4,13]. 

Then  the  second-order  accurate  flux  is  written  : 

5  =  *(%%»?’') 


3.4.  Boundary  conditions 

They  are  introduced  in  a  similar  manner  for  each  level.  At  farfield  boundaries,  a 
(first-order)  splitting  using  inner  and  farfield  values  is  applied. 

Wall  conditions  are  weakly  introduced  through  a  pressure  boundary  integral. 


4.  MULTIGRID  SCHEME 
4.1.  Basic  Iteration  Method 

Following  A.  JAMESON  [11],  a  Runge-Kutta  scheme  is  applied  with  either  one 
or  four  time-steps  ;  in  the  second  option,  the  following  coefficients  are  employed 
(see  [9,10])  : 

=  .11  a2  =  .2766  a3  =  .5  a4  =  1. 
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4.2.  MG  scheme 

The  basic  algorithm  uses  FAS  iterations  and  is  not  basically  different  from  the 
one  described  in  [8].  The  main  difference  lies  in  the  definition  of  the  transfer 
operators  that  are  much  simpler  in  the  present  approach  : 

-  Fine-to-coarse  :  values  are  averaged  in  a  conservative  manner. 

-  Coarse-to-fine  :  the  trivial  injection  is  applied. 


4.3.  Second-order  MG  version 

The  second-order  spatial  scheme  is  introduced  into  the  fine-grid  solver  only  for 
the  third  three-grid  phase  of  the  full-multigrid  process.  This  introduces  a  minor 
modification  in  the  algorithm.  However,  two  disadvantages  appear  in  this  con¬ 
struction  :  first  the  coarse  level  correction  is  less  consistent  with  the  fine  level 
smoother  ;  second,  in  a  full-multigrid  approach,  the  third  phase  starts  from  a 
first-order  (medium  level)  solution  instead  of  a  second-order  one.  However,  it  will 
be  seen  in  Section  5,  that  this  does  not  result  in  a  too  severe  convergence  degra¬ 
dation. 


5.  NUMERICAL  ILLUSTRATION 

In  this  preliminary  paper,  special  attention  is  paid  to  obtaining  first-order  accu¬ 
rate  solutions;  they  are  easier  to  obtain  because  of  the  internal  dissipation  ;  more¬ 
over  second-order  solutions  can  be  derived  from  first-order  ones  without  using  the 
above  second-order  version  (see  [17]). 


5.1.  Two  experiments  with  nested  meshes 

The  calculations  of  an  internal  flow  in  a  channel  with  a  4.2%  thick  bump  are 
presented.  It  has  been  observed  that  the  regime  defined  by  a  Mach  number  at 
infinity  equal  to  .85  is  representative  of  the  usual  stiffness  of  such  a  problem. 

A  first  mesh  is  presented  in  FIG.l  and  contains  161  vertices.  The  three  suc¬ 
cessive  levels  are  also  depicted  :  the  dual  fine  level  (161  control  volumes),  the 
medium  level,  the  coarse  level.  The  convergence  histories  are  shown  for 
-  standard  one-grid  calculations  with  each  of  the  three  levels  (the  initial  data 
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Figures  la  to  Id  :  Flow  in  a  channel  with  a  4.2%  thick  circular 
bumf.  Mach  at  infinity  =  .85  ;  la  :  triangulation  ;  lb  :  dual 
mesh  ;  lc  :  medium  level  ;  Id  :  coarse  level. 
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cycles,  n.  cycles,  n. 

250  500  750  375  750  1125 


Figures  le  to  lh  :  Convergences  :  le  :  one-grid  with  each 
grid  ;  If  :  one-grid  with  coarse  level  initial  solution  ; 
ig  :  FMG,  first-order  ;  lh  :  FMG,  second-order. 
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Figures  2a  to  2d  :  Same  as  Fig.  1,  with  a  finer  triangulation. 


.. .  •  • 


LaUemand  aad  Dervieux 


tures  3a  to  3d  :  Flow  around  a  NACA  0012  airfoil. 

h  at  infinity  =  .72,  angle  of  attack  =  0  deg.  ;  3a  :  triangulation 
:  dual  mesh  ;  3c,  3d  :  coarser  meshes. 


Figures  4a  to  4b  :  Flow  around  a  NACA  0012  airfoil,  continued. 
Mach  and  pressure  contours. 
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correspond  to  uniform  flow), 

-  a  one-grid  calculation  over  the  successive  levels  using  as  initial  data  the  result 
obtained  from  the  previously  employed  coarser  grid, 

-  a  full-multi-grid  calculation,  that  is  one-grid  scheme  with  the  coarse  level,  then 
two-grid  scheme  with  the  medium  one,  and  three-grid  with  the  fine  triangulation. 

In  order  to  evaluate  the  behavior  of  the  scheme  when  the  number  of  nodes  is 
increased,  we  present  the  same  experiment  (FIG.2a  to  2h)  with  a  finer  triangula¬ 
tion  derived  from  the  previous  one  by  dividing  equally  each  triangle  into  four  new 
ones.  The  triangulation  now  contains  585  vertices.  However  again  only  three 
grids  are  employed.  A  comparison  of  the  convergence  history  proves  that  the  con¬ 
vergence  rate  is  rather  constant  in  each  phase,  and  then  approximately  equal  to 
the  coarse-grid  convergence. 

As  advised  by  the  editing  referee,  we  add  a  first-order  four-grid  FMG  calculation 
recently  obtained.  In  the  fourth  phase,  the  residual  is  reduced  by  three  orders  of 
magnitude  after  155  cycles  (see  FIG.  2j).  The  fourth  coarse  level  is  showed  in 
FIG.  2i. 


5.2.  Application  to  a  strictly  unstructured  mesh 

The  efficiency  is  now  evaluated  with  the  calculation  of  an  external  flow  around  a 
NACA0012  airfoil  (Mach  =.72,  angle  of  attack  =  0  deg.).  The  triangulation  is 
now  really  unstructured  :  it  results  from  the  use  of  a  mesh  generator  based  on  an 
element  front  algorithm  and  contains  800  vertices  (FIG  .3a).  The  convergence  is 
again  fast  ;  the  second  order  accurate  solution  is  obtained  in  about  150  three-grid 
cycles  in  the  third  phase  when  the  one-step  Runge-Kutta  is  applied  (FIG.3f) 
while  the  convergence  of  the  first-order  version  required  only  80  cycles  (FIG.3e). 
The  loss  of  symmetry  in  the  coarse  grids  does  not  seem  severely  penalizing.  The 
Mach  and  pressure  contours  of  the  resulting  solution  are  shown  in  FIG.4. 


5.3.  Comparison  with  a  previous  approach 

Another  important  experiment  is  the  comparison  with  a  more  classical  MG  algo¬ 
rithm  that  can  be  described  as  follows  : 


Multi-triangulation  algorithm  :  three  triangulations  are  nested  standardly 


levels  used  in  the  presented  scheme 
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(by  element  division)  and  the  spatial  scheme  is  the  second-order  upwind  scheme 
at  each  level  ;  we  refer  to  [8,10].  The  successive  nested  triangulations  contain 
respectively  121,  442,  and  1684  vertices. 

We  present  three  calculations  relying  on  the  fine  triangulation,  using  four-step 
Runge-Kutta  three-grid  algorithms  and  starting  from  uniform  flow. 

A  comparison  of  the  convergence  histories  of  the  two  algorithms  is  presented  in 
FIG.5e  : 

-  when  the  Multi-triangulation  algorithm  is  applied  with  second  order  flux- 
splitting  over  the  three  levels,  the  solution  is  obtained  in  about  40  cycles. 

-  when  the  same  algorithm  is  applied,  but  with  first  order  flux-splitting  over  the 
two  coarse  levels  and  the  second  order  splitting  over  the  fine  level,  the  solution  is 
obtained  in  about  80  cycles,  with  a  more  monotone  convergence. 

-  with  the  presented  algorithm,  the  solution  is  obtained  in  about  80  cycles  (with 
the  same  convergence) . 


This  seems  to  prove  that  the  difference  between  these  two  approaches  essentially 
comes  from  the  lack  of  accuracy  of  coarse  grid  smoothers. 


5.4.  Application  to  a  locally  refined  mesh 

The  combination  of  local  mesh  refinement  and  multigrid  algorithm  is  frequently 
advocated  ;  successive  grid  levels  are  constructed  by  local  refinement  of  the  pre¬ 
vious  grid  level.  One  disadvantage  of  this  approach  is  that  these  levels  operate 
only  locally  and  this  may  reduce  the  speed-up  with  respect  to  the  standard  glo¬ 
bal  multigridding. 

In  this  section,  we  wish  to  demonstrate  that  the  coarsening/agglomerating  algo¬ 
rithm  enables  us  to  generate  global  coarse  levels,  in  order  to  keep  the  complete 
multigrid  speed-up. 

We  start  from  a  locally  refined  mesh,  constructed  for  the  calculation  of  the  flow 
past  a  cylinder  [18].  The  fine  mesh  contains  2141  nodes  ;  then  a  medium  mesh  is 
derived,  containing  598  zones,  and  finally  a  coarse  one  with  244  zones.  The  ratio 
of  the  levels  is  satisfactory.  In  FIG.  6a,b,c,d,  the  different  levels  are  shown  to 
demonstrate  the  regularity  of  the  partitions.  To  compare  algorithms  we 
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considered  the  case  of  a  freestream  Mach  number  of  0.38.  It  appears  from  FIG. 
6e.  that  applying  the  second-order  method  (over  the  fine  level  only)  compared  to 
applying  the  first-order  method  over  all  levels  results  in  a  reduction  of  conver¬ 
gence  rate  (in  terms  of  iterations)  by  a  factor  noticeably  less  than  2. 


6.  CONCLUSION 

We  presented  a  multigrid  approach  that  applies  to  an  arbitrary  finite-element  tri- 
angulation  in  an  automatic  manner.  As  in  algebraic  algorithms,  only  one  mesh 
has  to  be  handled,  namely  the  triangulation  that  supports  the  steady  discrete 
solution. 

For  the  first-order  approximation,  the  method  is  as  efficient  as  a  standard  one. 

The  second-order  version  that  is  presented  is  easily  derived  but  shows  a  slower 
convergence  ;  new  versions  are  under  development. 

An  implicit  formulation  is  also  currently  studied. 

Further  experiments  are  necessary  in  order  to  validate  the  approach  in  more 
practical  applications  (such  as  3-D  Euler  calculations). 
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generation  (MEGG)  to  produce  body-fitted  coordinates  for  arbitrary 
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This  is  modified  using  an  adjustable  artificial  density  to  achieve 
computational  stability  and  efficiency.  An  alternating  line  Gauss-Seidel 
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each  uniform  grid  equation.  The  fast  adaptive  composite  grid  method  (FAC) 
is  applied  to  improve  accuracy  by  local  refinement  near  the  wing  surface. 

1.  INTRODUCTION 

Much  progress  has  been  made  in  transonic  flow  computations  since  the 
historic  paper  by  Murman  and  Cole  [1].  Yet,  in  order  to  solve  very  large- 
scale  problems,  there  is  still  a  critical  need  to  develop  more  efficient 
computing  methods.  Inspired  by  the  early  work  of  Brandt  on  multigrid 
methods  [2],  several  authors  have  studied  the  multilevel  technique  as  a 
basic  solver  for  transonic  flow  equations  (cf.,  Brandt  [3],  Jameson  [4], 
MacCarthy  [5],  Beerstoel  [6],  Jespersen  [7],  Ni  [8],  Shmilovich  [9], 
Johnson  [10],  and  Sankar  [11]).  The  present  paper  is  in  part  a 
continuation  of  these  studies.  Its  ingredients  include: 

multigrid  (FAS)  as  the  basic  uniform  grid  solver;  multigrid  elliptic 
grid  generation  (MEGG;  cf.  [12,  13]);  the  finite  volume  discretiza¬ 
tion  method:  an  adjustable  artificial  viscosity  to  achieve  computa¬ 
tional  stability  and  efficiency;  and  the  fast  adaptive  composite  grid 
method  (FAC;  cf.  [14])  applied  to  improve  accuracy  by  local  refine¬ 
ment  near  the  wing  surface. 

2 .  MODEL  PROBLEM 

As  a  first  attempt  to  study  the  performance  of  this  approach  for  our 
prototype  problem,  we  choose  uniform  inflows  with  zero  attack  angle  around 
the  symmetric  airfoil  NACA  0012.  Since  the  flow  is  symmetric,  we 


Fig.  1.  Transonic  flow  around  NACA  0012 
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investigate  the  flow  only  in  the  upper  half  plane.  The  flow  is  considered 
as  full  potential  flow,  in  which  case  the  axis  of  symmetry  may  be  treated 
in  the  sane  manner  as  the  wing  surface,  namely  as  a  solid  wall  where  the 
normal  velocity  is  zero  (Fig.  1). 


3.  MULTIGRID  ELLIPTIC  GRID  GENERATION  (MEGG) 

3.1.  EGG  Equations 

EGG  with  zero  control  functions  P  =  Q  =  0  is  applied  here  (Figs.  2  and  3) 
to  produce  a  body-fitted  grid  in  the  physical  domain  and  a  uniform  grid  in 
the  computational  domain.  The  nonlinear  EGG  equations  (with  zero  control 
functions;  cf.  [12]),  which  define  the  mapping  from  the  computational 
coordinates  ({  ,  q)  to  the  physical  ones  (SC,  V),  are 


2/3SC  +  rSC  =  0 

to  no 

2/rv  +  mj  =  o 

to  on 


(1) 

(2) 


3.2.  Finite  Difference  Equations 

The  finite  difference  equation  corresponding  to  (2)  can  be  obtained  using 
central  difference  on  the  derivatives  as  follows  (subscript  p  refers  to 


Fig.  2.  EGG  physical  grid 


Fig.  3.  EGG  computational  grid 
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the  central  point,  e  to  Its  eastward  neighbor,  etc.): 


5C+3C-2SC  SC  -SC 

„  E  ~p  _  2A  NE  *NW 


SC  +  *  SC  +  2-2* 

SE  SW  +  ^  N  S  p 


H 


H  4H' 

Equation  (3)  May  be  written  in  the  general  form 

Vp  ‘  AE*E  +  Vw  +  Vn  ♦  Vs  +  ane*ne  +  Wnw  +  ase*se  + 

where 


0.  (3) 

AS^SW  (4) 


^E  - 


A  =  -  J- 
SW  2H2 


A  ,  A 

AN  As  h  2  ’ 
'Sw  =  ase  = 


A  =  3 
P 


(a  +  i) 

H2  ' 


(Note  that  (4)  is  a  nonlinear  algebraic  system  since  a,  p  and  r  are 
functions  of  2  and  V.)  We  can  develop  difference  equations  for  V  in  a 
similar  manner. 


3.3.  Initial  Guess 

Equation  (4)  can  be  solved  by  an  iterative  process,  but  a  good  initial 
guess  is  usually  necessary  for  efficiency.  In  this  paper,  we  study  two 
possible  strategies  for  this.  The  first  is  to  use  transfinite 
interpolation  (cf.  [12])  from  boundaries  into  the  field.  With  indices 

1  <  i  <  I  and  1  <  j  <  J,  where  i  =  1  or  I  or  j  =  1  or  J  refers  to  the 

boundary,  then  we  determine  the  initial  guess  2°  from  the  boundary  values 

2  by 


*  (l.  J) 


-pi-fed.  J )  *  J)  ♦  ^-j*(i.  J)  ♦  1) 


1 ■  ‘  1J  *  J)  -  -r-r-r'-r-— -7* ( I »  D 


(5) 


i  -  i  j  -  r ' "  i  -  i  j  -  r 

-  I  ~  —  ~  ^  ~  ^  ■  £  ~  itr  I  i  1) 

i  -  i  j  -  r(1,  J'  i  -  i  j  -  r  1  1)- 

The  second  approach  we  consider  is  full  uultigrid  (FMG;  cf.  [2]).  The 
central  idea  here  is  to  start  on  very  coarse  levels  using  a  basic 
Multigrid  cycling  scheme  to  obtain  a  good  approximation  there,  then 
cubically  interpolate  this  approximation  so  it  can  act  as  an  initial  guess 
for  basic  Multigrid  cycles  on  successively  finer  levels. 


3.4.  Pull  Approximation  Scheme  (FAS) 

Gauss-Seidel  relaxation  for  solving  system  (4)  typically  stalls  after  a 
few  iterations.  This  is  because  Gauss-Seidel,  though  effective  for 
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high-frequency  errors  components,  has  very  little  effect  on  low-frequency 
components.  Multigrid  (cf.  [2])  capitalizes  on  this  "smoothing"  property 
of  Gauss-Seidel  by  visiting  coarser  grids  to  resolve  smooth  errors.  To 
accommodate  the  nonlinearities  in  (4),  we  use  the  full  approximation 
scheme  (FAS;  cf.  [2])  version  of  multigrid  with  bilinear  interpolation  and 
full  weighting  of  residuals  and  approximations.  (See  [2]  for  details  on 
FAS. ) 

In  Table  1,  we  show  the  results  of  various  V-cycles  applied  to  (4). 
In  this  example  we  used  four  grids;  the  finest  was  33  by  17  and  the 
coarser  ones  were  17  by  9,  9  by  5  and  5  by  3.  We  depict  results  of  six 
FAS  cycles  in  terms  of  the  residual  norms  (we  display  the  residual  norms 
for  X  only  because  the  higher  sc  resolution  means  that  the  V  residuals  are 
naturally  much  smaller)  and  required  "work  units".  (A  work  unit  is  a  cost 
equivalent  to  one  Gauss-Seidel  sweep  on  the  finest  grid.)  This  is  shown 
for  four  different  V(y  ,  o  )  cycles,  where  u  and  v  are  the  numbers  of 

1  C  1  fa 

relaxation  sweeps  performed  before  and  after  coarse  grid  correction, 
respectively.  Note  that  V(l,  1)  seems  most  efficient  with  an  average  per 
work  unit  residual  reduction  factor  of  about  0.51. 


Table  I  -  Comparison  of  various  V-cycles  applied  to  the  EGG  equations 


FAS 

cycle 

V(U) 

V(1.2) 

V(2.1) 

V(2.2) 

Resid, , 

WUs 

Resid, 

WUs 

Resid, 

WUs 

Resid, 

WUs 

1 

0.952E-7 

2.813 

0.179E-7 

4.063 

0.291E-7 

4.063 

0.949E-8 

5.313 

2 

0.641E-8 

5.625 

0.134E-8 

8.125 

0.262E-8 

8.125 

0.678E-9 

10.63 

3 

0.819E-9 

8.438 

0.107E-9 

12.19 

0.230E-9 

12.19 

0.538E-10 

15.94 

4 

0.101E-9 

11.25 

0.190E-10 

16.25 

0.504E-10 

16.25 

0.167E-10 

21.25 

5 

0.264E-10 

14.06 

0.674E-11 

E29 

0.186E-10 

0.681E-11 

26.56 

6 

0.822E-11 

16.88 

0.199E-11 

24.38 

0.731E-11 

24.38 

0.269E-11 

31.88 
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3.5.  FAC  for  Local  Refinement 

In  the  vicinity  of  the  leading  edge  of  the  airfoil  and  at  shock  waves 
produced  by  the  flow  problem,  velocity  gradients  can  be  very  large.  Local 
refinement  can  be  effective  for  providing  more  accurate  results  and  finer 
details  of  the  solution.  Here  we  use  the  fast  adaptive  composite  grid 
method  (FAC;  cf.  [14]),  which  we  test  by  placing  a  local  49-by-13  grid 
about  the  airfoil  as  shown  in  Pig.  2  (see  [14]  for  details  on  FAC). 

Table  II  shows  residual  norms  and  work  units  for  each  FAC  cycle. 
Since  FAC  involves  approximate  solvers  on  each  grid  (both  the  coarse  and 
fine;  here  we  use  a  single  multigrid  V(l,  1)  cycle  as  the  basic  grid 
solver),  we  have  depicted  residual  errors  for  each.  The  correct  error 
measure,  however,  is  the  composite  grid  residual  norm  (see  Section  10), 
which  we  have  also  shown.  The  work  units  are  measured  in  terms  of  the 
composite  grid  equations,  hence  the  correspondence  with  V(l.l)  in  Table  I. 


Table  II  -  Convergence  history  of  FAC  for  EGG 


FAC 

Residx 

Residx 

Residx 

WUs 

cycle 

on  coarse 

on  fine 

on  composite 

1 

0.683E-8 

0.164E-10 

0.720E-8 

2.813 

2 

0.952E-9 

0.152E-10 

0.102E-8 

5.625 

3 

0.311E-9 

0.157E-10 

0.351E-9 

8.438 

4 

0.141E-9 

0.154E-10 

0.161E-9 

11.25 

5 

0.560E-10 

0.150E-10 

0.685E-10 

14.06 

6 

0.212E-10 

0.146E-10 

0.357E-10 

16.88 
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Table  HI  *  EGG  discretization  error  estimates 


D 

El 

hx 

hy 

ERRXm„ 

ERRYm» 

ERRXtvt 

ERRYtvt 

17 

17 

0.0625 

0.0625 

0.0065 

0.0060 

0.0032 

0.0032 

33 

33 

0.0313 

0.0313 

0.0016 

0.0015 

0.0008 

0.0008 

65 

65 

0.0156 

0.0156 

0.0004 

0.0004 

0.0002 

0.0002 

3.6.  Discretization  Error  Estiaates 

In  order  to  examine  the  accuracy  of  the  numerical  solution,  we  used  the 
following  test  problem: 

4  2  2 

cflt^  -  2/JSC ^  ♦  ^rjrj  “  2,1  sin  cos  ™7  +  cos  nt )  (7) 

4  2  2 

cfy^  -  *  "^qq  ”  2,1  cos  sln  ra7 1 cos  +  sin  nt )  (8) 

t  =  0:  X  =  0,  V  =  sin  ; 

e  -  1:  X  =  0,  V  =  -sin  n{  ; 
q  «  0:  X  <*  sin  n/7,  V  *  0; 
q  -  1 :  X  *  -sin  n/7 ,  V  ■  0. 

Its  analytical  solution  is  X  =  sin  ir(  cos  nq  and  1/  =  cos  sin  Ttq .  We 

then  compared  the  results  of  applying  several  V{1,  1)  cycles  with  this 

solution  on  the  three  meshes  depicted  in  Table  III.  We  used  L  norms  and 

00 

average  deviation  estimates  in  both  X  and  V  directions.  Note  the  very 
2 

apparent  0(h  )  behavior  of  these  errors. 


4.  GOVERNING  EQUATIONS 

We  will  now  apply  the  grid  generation  technique  to  the  solution  of  the 
full  potential  equation,  which  in  strong  conservation  form  may  be  written 


3(pu)  __  d(pv)  _  A 
ex  dV 

P  =  [  1  ♦  M?(l  -  u2  -  v2)]1/(V_1) 


(10) 

(11) 


2  00' 

where  p  is  the  density  of  gas,  u  is  the  specific  heat  ratio  and  M  *.«  the 
Mach  number  at  infinity  (i.e.,  the  undisturbed  fluid  velocity  normalized 
by  the  speed  of  sound).  For  potential  flows,  we  may  write  the  velocity 


components  as  u  =  *  ,  v  =  where  A  is  the  velocity  potential, 
x  y 
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In  computational  coordinates .  (C  .  H).  (10)  and  (11)  may  be  rewritten 


as 


where 


kipV)  *  3?<pV)  *  0 

[i  +  m2u  -  u*t  ♦  vp/7)/j1>]1/(w'1) 


u 

V 


A  ♦ 

ne 

22  rj 


A,  ♦ , 
12  7 

A.  A.  , 
12  $ 


2  2 

X  +  V 

o  o 


n 


22 


X,X  +  v,v„ 
f  o  to 


12 


and  J,  *  9C  V 

1  to 


VX  . 

t  0 


(12) 

(13) 


5.  BOUNDARY  CONDITIONS 
5.1. 


On  the  wing  surface  and  symmetric  axis, 

d* 

v  »  * 

n  dn 


the  normal  velocities  are  zero: 
0.  (14) 


5.2. 

At  downstream  boundaries 


free  flow  conditions  are  used: 


d  U 

-  **  0 

d% 


or 


a2» 

ax2 


5.3. 


Upstream  and  top  boundaries  are  treated  as  far  field  where  we  assume  zero 
disturbance : 

u  =  u  and  v  =  0 . 

00 


Fig.  4.  Boundary  conditions 
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6.  DISCRETIZATION  OF  THE  POTENTIAL  EQUATION 

The  finite  volume  method  is  a  convenient  and  accurate  way  to  construct 
difference  equations  for  points  of  the  interior  and  boundary,  especially 
for  the  interface  points  at  the  refinement  regions.  We  discuss  several 
cases  as  follows,  where  for  simplicity  we  assume  that  each  grid  (whether 
global  or  local)  is  assumed  to  be  uniform  with  equal  mesh  spacing  in  the  { 
and  q  directions. 

6.1.  Interior  Points 
(Fig.  5)  We  rewrite  (12)  as 

I  v-piJdr  «  [  pv}* n  ds  »  o  (15) 

JT  Jr 

where  vJ  is  the  vector  function  with  components  U  and  V,  T  is  a  small 
control  volume  centered  around  an  interior  point  P({,j,  H ^ j)»  r  is  the 

boundary  of  T  which  is  assumed  to  be  polygonal,  and  n  is  the  outward  unit 
vector  normal  to  r.  For  any  given  line  segment  rQ  of  r  we  denote  the  flux 

tl  =  f  pvJ-nds.  (16) 

o  J  rQ 

By  our  assumptions  on  grid  uniformity  (for  the  computational  plane  (t.rf)). 
we  may  take  T  to  be  a  square  centered  at  P  with  sides  of  length 
h=$1+1  j-Cj  j  =  j  (assume  for  simplicity  that  h  *=  1  for  the 


Fig.  5.  Finite  volume  method 
for  interim  points 
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finest  grid)  that  is  aligned  with  the  grid.  Then  using  subscripts  to 
denote  the  east,  west,  north  and  south  segments  of/*,  we  can  rewrite  (15) 


tt  ♦  tt  *  tt  +  tt  -  0.  (17) 

e  w  n  s 

Let  (pU)e  denote  some  approximation  to  pU  along  the  east  boundary 
segment  of  T  (e.g.,  (pU)e  is  some  average  value  of  pU  along  e)  and 
similarly  for  (pU)w,  (pV)  an<*  (pV)  .  Then  (17)  may  be  approximated  by 


the  equation 


Since 


(pU)e  -  (pu)w  +  (pV)n  -  (pv)g  -  0. 


It  =  A  *  -At 

H  e  12  <7 

V  -  A.  -  A,  .t .  , 

22  rj  12  £ 

then  we  may  use  central  difference  approximations  to  the  fluxes,  yielding 

(pAll }  1  (^i+l, j^i. j)_(pA12)  1  '  4(>i+l,j+l+*i,j+l‘*i+l,j-l‘*i,j-l) 

1  2’J  1+2 ’ ^ 

”  (PA  .(*i,j“*i-l,j)+1P''l2).  1  .  4(*i,  j+l+*i-l,  j+l~*i,  j-l“*i-l,  j-1* 

l-g.J  1-2-J 

+  (PA22).  l(*i,  j+l“*i,  j)"(pA12) .  1  T^i+l.J+l^i+l,  j~*i-l,  j+l~*i-l.  J1 

l.J+2  i.J+2 

_(pA22)  .  j^i.  j-l)+(pA12)  .  .  '  4^1+1,  j^i+1,  j-l“*i-l,  j“*i-l,  j-11 

1 .  J  2  1  ’  J  2 


where 

(pAll,i+J.  *  2^(pAll)i-*-l,  j  +  (PA1 1 }  i  ,  J5  * 
1  2,J 

(PA11)  J.  ’  2t(pAll)i, j  +  <PA1 1 1 i-1 , J 1  ’ 
2’J 

and  so  on.  This  can  be  written  in  the  general  form 

Vp  =  Ve  +  V«  +  Vn  +  Vs 

+  ^E^NE  +  ANW*NW  +  ASE*SE  +  ASW*SW 


ae  “  (pAn*  i  .  “  4t(pAi2),  ,1  _  (pAi2*  ,  l1. 


1  2J 


t-J-i 


AW  “  (pAll}  1  4t(pA12>.  1  ~  (pA12\  1]' 

i-j.J  l.J+j  i.j-j 


where 
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6.2.  Solid  Wall  Points 

(Fig.  6)  Because  the  south  boundary  Is  a  solid  wall,  then  we  have 

f(  =0.  From  (17),  we  then  get 
s 

f*  +  ft  ♦  ft  =  0  (21) 

e  w  n  ' 

which  is  approximated  by 

<pU)e  -  (pu)w  +  (pV)n  =  0  (22) 


Fig.  6.  Finite  volume 

method  for  solid 
wall  points 
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4(iPAl  1 }  .1  .(*1+1.J"*1.J)_(PA12).  1  ,  '  4^1+1,  J+ 1^1.  J+l"^  1+1.  J_#1,J) 

1+"2 •  J  .  1 

"  i(pAn).  i  j)+(pAi2)  j.  '  T^i.j+i^i-i.j+i^u^i-i, j* 

2'J  2'3 

+  (PA22)  l(*i. j+l~^l.J)"(pA12)  1  ’  4(S+l.J+l+*i+l.j'*l-l,J+l“*i-l,J) 

*  *  ^  »  J 


This  we  write  as 


where 


A  $  «  A_0  +  A^  +  A  ♦  +  A  ♦  +  A  ♦ 

p  p  EE  w  w  n  n  ne  ne  nw  nw 


AE  “  2(PA11)1+^  j  “  4^(PA12)1  j+_1  (PA12J1+J_ 

AW“7<PA11)J  1  +  12*  ,1  “  (pA12*  1  J’ 


i-f  J 


AN  *  ^pA22  * .  4,1  "  4t  (pA!2  ^  1  ,  '  ^pAl2*  1  J’ 


l^.j 


J 


AnE  °  "  4t(PA12*  1  .  +  (PA12*  ^ 


i+i-J 


ANW  4[(pA12*.  1  +  (pA12*  1^‘ 

l.J+2 

Ap  =  AE  +  \  *  \  *  \e  +  'W 


6.3.  Other  Boundary  Points 

For  upstream  and  top  boundary  points,  we  simply  keep  the  t  unchanged, 


1  .e.  , 


t  =  u  3C  +  V  V . 

oo  oo 


For  downstream  boundary  points,  we  implicitly  embed  the  free  flow 
condition  *  0  into  columm  I  -  1  (next  to  the  downstream  boundary). 


6.4.  Composite  Grid  Interface  Points 

For  the  composite  grid  (Fig.  7)  we  must  be  careful  in  our  treatment  of  the 
interface  points  between  the  coarse  and  fine  grid  points.  We  use  (17)  to 
develop  the  following  equation  for  point  P  (Fig.  8): 
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<p*n'1  j  ,  -  «'»*!*>,  , 

1+2>J  1+2'-> 

-  «**»>,  .  <V»>  *  1  ,•  r,%*%w-*s-,s»' 

1  2,J  1  2'^ 

*  l'>A22>  ,  1  <*»-*,> 

1  ’ 3*2  1 ' 3*2 


t  1, 


-  i  1<W  +  (PA12)|  ,  lt#SE-  ^p^sV 

l.J^  i.J-j 


sw 


)] 


0. 


(26) 


1 - 1 

’  L _ 2 

; - i 

1 - 

f - J 

r - 1 

> 

L - 

C _ jj 

c 

as _ b 

L 

a _ b 

k  * 

t 

B 

in 

J 

i  J 

1  , 

i  t 

f  J 

i 

•7^ 

n 

M 

\  < 

J - * 

>77 

f - ^ 

►7* 

*  ’  coarse  grid  points 

0  -  fine  grid  points 

Q  - .  interface  points 

•  ‘  boundary  points 

Fig.  7.  Composite  grid 


Fig.  8. 


Difference  schemes 
for  interface  points 
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Here,  superscript  f  is  used  to  specify  fine  grid  quantities.  Now  this  we 
write  as: 

ap%  Aw^  w  +  Vn  +  As*s  +  'Wnw  +  As/  SW 


where 


+  ae*e  *  'Wne  +  ASE*SE 


(27) 


AW=  1  ,  1  '  (pA12>.  ,  1]’ 

1  .  J’Ko 


i  2  ■  J 


U-i 


AW  “  (PAoo)  1  +  t[(PAio)  1  +  (PA1?)  1 

N  22  i(j+|  4  12  i^J  12  ij+| 


AS  *  (pA22> 


2 


1  -?«pA12).  1  .  +  (PA12> 


]. 

). 


ANW  -  4[<PA12>  1  ,  +  <PA12>  .  ll- 

*  2  *  J  ^  » J  g 


and 


SW 


-7(I',A12,1  1  .  *  <**«>  ,  1 

*  2  *  J  » J  —  g 


). 


AE  -  2<PAn>  !  • 

‘41 

ANE  =  ~^pAi2^  1  +  ^pA12^  1’ 

NE  12  i+J.J  U+j 

ASE  =  (pA12,1  1  ..  +  (PA12),  ,  1' 


1+2 -J 


Ap  \i  +  AN  +  AS  +  ANW  +  Asw  +  AE  +  ANE  +  A: 


SE' 


6.5.  Computation  of  the  Density  p 

In  order  to  calculate  the  coefficients  of  t>  in  (20),  (24)  and  (27),  we 

should  compute  the  density  p  in  advance.  For  interior  points,  we  use  the 
expression 

"  ■  {'  *  H1  "-<>  -  <An*t  -  A12VVJJ 

(28) 

1 

-  <A2j‘„  -  Ai2*e 

At  the  far  field  boundaries,  p  is  set  to  unity.  On  the  solid  wall,  the 
zero-normal  velocity  condition  should  be  used,  namely 

tf-n  -  D-v(V  -  f(*)]  -  0  (29) 

where  V  -  f(X)  is  the  airfoil  contour  function.  Hence, 

~*xf'(*)  +  *  0 
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iaportant  differences: 

i)  We  used  two  different  kinds 


of  weighting  operations: 


and 


(partial  weighting) 


(35) 


_1  1_  _l' 

16  8  16 

I  i  i 

8  4  8 

111 
16  8  16 


(full  weighting).  (36) 


lfh  was  used  to  transfer  the  unknown  *h  to  coarse  levels. 
h  h 

was  used  to  transfer  residuals  to  coarser  levels.  For  reasons 
that  are  not  yet  clear,  this  combination  showed  the  best 
performance  of  the  standard  options  tested. 


Table  IV  -  Convergence  history  of  FAC  for  potential  flows 


m 

in 

o 

II 

8 

3 

Af<x>=0.75 

Residual 

on  coarse 

Residual 

on  fine 

Residual 
on  composite 

Residual 

on  coarse 

Residual 

on  fine 

Residual 
on  composite 

cycle 

WUs 

WUs 

1 

0.702E-4 

0.239E-5 

0.106E-3 

5.625 

0.146E-3 

0.635E-4 

0.174E-3 

5.625 

2 

0.686E-5 

0.135E-5 

0.150E-4 

11.25 

0.270E-4 

0.372E-4 

0.290E-4 

11.25 

3 

0.934E-6 

0.574E-6 

0.228E-5 

16.88 

0.666E-5 

0.128E-4 

0.760E-5 

16.88 

4 

0.161E-6 

0.967E-7 

0.344E-6 

22.5 

0.208E-5 

0.173E-5 

0.198E-5 

22.5 

5 

0.249E-7 

0.169E-7 

0.532E-7 

28.13 

0.610E-6 

0.155E-6 

0.584E-6 

28.13 

6 

0.404E-8 

0.923E-8 

0.814E-8 

33.75 

0.184E-6 

0.354E-7 

0.172E-6 

33.75 

7 

0.543E-7 

0.909E-8 

0.508E-7 

39.38 

8 

0.155E-7 

0.720E-8 

0.147E-7 
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li) 


ill) 


lv) 


Using  (20)  to  construct  the  coarse  grid  operators  L2h,  the 
2tl  oh  2h 

quantities  A*”,  A^  and  A^,  which  depended  only  on  physical 
node  distribution,  can  be  defined  simply  by  restricting  the 

corresponding  grid  h  quantities.  On  the  other  hand,  p  ,  which 

depends  on  the  unknown  ♦  2h.  should  be  calculated  by  (28)  and 
(32).  Initially,  (28)  would  be  evaluated  using  the  starting 


.  2h  r2h  h 
guess  *  ■  Ih  $  . 

In  the  transonic  flow  region,  we  use  an  artificial  viscosity 
which  is  related  to  the  local  Mach  number  M.  For  compatibility 
in  the  solution  process,  we  geometrically  align  the  transonic 


2h 

regions  on  all  levels  by  defining  M*1  to  be  the  restriction  of 

Mh  to  grid  2h.  (Careful  treatment  of  the  Mach  number  as  a 
nonlinearity  could  lead  to  a  more  efficient  scheme,  especially 
for  the  case  of  sharp  shocks.  However,  this  approach  is 
potentially  much  more  complex  with  several  open  questions 
remaining,  so  it  will  be  left  for  further  study.) 

A  critical  aspect  of  the  solution  process  is  careful  treatment 
of  the  boundaries.  For  Dirichlet  boundary  conditions,  we  only 


AU 

need  to  use  17  for  all  f ine-to-coarse  transfers  of  #  in  the 
h 

usual  way;  we  not  need  to  use  if*1,  L*1,  L2h  or  I„.  at  the 

n  2n 


NtAl 


N 

P 

NE 

E 


Fig.  9.  6-point  stencil 
for  residual 
transfer  on  solid 
wall 
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boundaries.  At  Neumann  boundaries,  we  should  use  the  difference 

operators  L  ,  L  the  coarse-to-f ine  interpolation  operator 
h  2h 

and  a  new  1^  for  residual  transfer.  In  particular,  for 
2h 

Ih  we  may  use  the  6-point  stencil  (Fig.  9) 


_1  i  1 
16  ?  16 

III 
8  2  8 


(37) 


for  the  restriction  of  residuals  to  the  coarse  grid  at  the  wing 
surface. 


9.  COMPUTATIONAL  RESULTS 

Three  flow  examples  around  airfoil  NACA  0012  are  investigated  in  this 
paper  using  a  global  grid  with  relaxation  only  and  with  a  4-level  FAS 
V(l,l)-cycle  as  the  solvers.  The  free  stream  Mach  numbers  used  for  these 
examples  were  -  0.0  (incompressible),  0.5  (subsonic),  and  0.75 
(transonic).  The  finest  grid  was  33  by  17.  Comparisons  of  the 
performance  of  relaxation  and  multigrid  are  depicted  in  the  respective 
Figures  10,  11  and  12.  Convergence  is  good  for  incompressible  and 
subsonic  flows,  where  the  residual  reduction  factor  per  multigrid  cycle 
(equivalent  to  about  5.625  work  units)  is  less  than  0.1.  In  particular, 

“8 

it  costs  only  28.125  work  units  to  achieve  residual  norms  of  0.3  x  10 
_8 

and  0.42  x  10  for  =  0.0  and  0.5,  respectively.  (These  experiments 
were  made  without  the  use  of  FMG  to  compare  with  earlier  work.)  However, 
our  implementation  of  multigrid  is  slower  for  transonic  flow.  (This  is 
probably  due  to  the  way  in  which  we  treat  the  shock  wave.)  Here,  45  work 
units  are  needed  to  achieve  a  residual  norm  of  0.62  x  io  with  an 

|  and  a  cutoff  Mach  number 
of  Mc  »  0.85.  Increasing  the  cutoff  Mach  number  Mc  leads  to  a  sharper 
shock  wave  at  the  expense,  however,  of  more  work  units. 

To  test  the  effectiveness  of  FAC,  we  placed  a  49-by-13  grid  about  the 
airfoil  (Fig.  3)  and  ran  several  cycles,  using  the  multigrid  solver 
described  above  on  each  grid.  Note  that  FAC  performance  on  this  composite 
grid  Is  similar  to  multigrid  performance  on  the  global  grid.  To  test 
resolution,  we  graphed  the  pressure  distribution  on  the  wing  surface  in 


artificial  viscosity  of  k  ■  2.0*Max  0,  1 


Liu  and  McCormick 


s  7?  //  /I  /i-  V  .f  ui  Ji  JS  U7  Jf  JI  S3  JJT  3/ 


Fig.  14.  Pressure  distribution  produced  by  FAC. 
Global  grid:  33  x  17,  -  0.75. 

Refined  grid:  41  x  13. 


Figures  13  and  14,  respectively,  for  the  global  grid  and  for  the  composite 
grid  tests.  The  greater  accuracy  obtained  by  FAC  is  evident  in  the 
increased  sharpness  of  the  shock  wave  and  the  attenuation  of  the 
oscillation  behind  the  shock. 

10.  CONCLUDING  REMARKS 

1.  Multigrid  and  FAC  can  successfully  be  used  in  coablnatlon  with 
elliptic  grid  generation  to  produce  a  body-fitted  coordinate 
systea  with  local  refineaent. 
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2.  Multigrid  and  FAC  can  be  used  for  nonlinear  problems  and  Neumann 
boundary  conditions  with  subsonic  flows  just  as  successfully  as 
for  linear  Dirichlet  problems,  provided  we  are  careful  with  the 
difference  schemes  (e.g.  the  finite  volume  method)  at  both 
interior  and  boundary  points  and  with  the  boundary  operators  and 
computational  details. 

3.  The  efficiency  of  our  Implementation  of  multigrid  for  transonic 
flows  is  not  as  good  as  it  is  for  the  subsonic  problem,  due 
primarily  to  the  way  we  currently  treat  the  discontinuity.  But 
with  proper  use  of  artificial  viscosity  and  scheme  switching 
criteria,  convergence  is  still  quite  reasonable.  True  multigrid 
efficiency  (for  all  values  of  Mc)  may  be  obtainable  by  proper 
FAS  treatment  of  the  local  Mach  number.  We  intend  to  explore 
this  further. 

4.  For  the  problems  we  have  treated,  it  might  be  suggested  that  a 
fully  conservative  discretization  technique  and  compatible 
numerical  solution  scheme  might  prove  more  effective.  For  the 
potential  flow  equations,  this  is  just  what  we  do.  For  EGG. 
such  an  approach  is  possible  but  probably  not  critical  because 
these  equations  seem  quite  well  posed.  Nevertheless,  we  are 
currently  exploring  this  approach. 

5.  FAC  is  a  convenient  and  efficient  method  for  refining  the 
computational  domain  in  order  to  obtain  more  accuracy  and  better 
resolution. 


REFERENCES 


[1]  E.  M.  Murman  and  J.  D.  Cole,  "Calculation  of  plane  steady  transonic 
flows".  AIAA  J. ,  Vol .  9,  No.  1.  p.  114  (1970). 

[2]  A.  Brandt,  "Multi-level  adaptive  solution  to  boundary-value 

problems",  Mathematics  of  Computation,  Vol.  31,  p.  333-390  (1977). 

[3]  A.  Brandt,  "Multi-level  adaptive  computations  in  fluid  dynamics", 
AIAA  paper,  Williamsburg,  VA  (1979). 

[4]  A.  Jameson,  "Acceleration  of  transonic  potential  flow  calculation  on 
arbitrary  meshes  by  the  multigrid  method",  Proc.  of  AIAA  4th 
Computational  Fluid  Dynamics  Conference,  p.  122-146  (1979). 


Liu  and  McCormick 


387 


[5]  D.  R.  McCarthy  and  T.  A.  Reyhner,  "Multigrid  code  for  3-D  transonic 
potential  flow  about  inlets",  A1AA  J.,  Vol.  20,  p.  45-50  (1982). 

[6]  J.  W.  Boerstoer,  "A  multigrid  algorithm  for  steady  transonic  flows 
around  aerofoils  using  Newton  iteration  in  multigrid  methods”,  NASA 
CP.  2202  (1981) . 

[7]  D.  C.  Jesperson,  "A  multigrid  method  for  the  Euler  equations",  AIAA 
21st  Aerospace  Sciences  Meeting,  Reno,  NV,  AIAA-83-0124  (1983). 

[8]  R.  H.  Ni,  "A  multiple  grid  scheme  for  solving  the  Euler  equations", 
AIAA  J..  Vol.  20,  p.  1565-1571  (1982). 

[9]  A.  Shmilovich  and  D.  Caughey,  "Application  of  the  multigrid  method  to 
calculation  of  transonic  potential  flow  about  wing-fuselage 
combinations  in  multigrid  methods".  NASA  CP.  2202  (1981). 

[10]  G.  M.  Johnson,  "Multigrid  acceleration  of  Lax-Wendroff  algorithm”, 
NASA  TM-82843  (1982). 

[11]  N.  L.  Sankar,  "A  multigrid  strongly  implicit  procedure  for  2-D 
transonic  potential  flow  problems",  AIAA-82-0931  (1982). 

[12]  J.  F.  Thompson,  2.  U.  A.  Warsi  and  C.  W.  Mastin,  Numerical  Grid 
Generation,  Fundamentals  and  Applications,  North  Holland  (1985). 

[13]  R.  K.  Jain,  "Generation  of  body-fitted  grids  around  airfoil  using 
multigrid  method",  Proc .  of  International  Conference  held  at 
Landshut,  West  Germany,  July  14-17,  1986. 

[14]  S.  McCormick  and  J.  Thomas,  "The  fast  adaptive  composite  grid  (FAC) 
method  for  elliptic  equations” .  Mathematics  of  Computations,  Vol.  46. 
No.  174,  p.  439-456  (1986). 

[15]  T.  L.  Holst  and  W.  T.  Ballhaus,  "Conservative  implicit  schemes  for 
the  full  potential  equation  applied  to  transonic  flows",  NASA 
TM-78469  (1978). 
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Method  for  Three-Dimensional  Elasticity 
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Fourier  analysis  of  model  problems  is  used  to  estimate  constants 
in  convergence  bounds  for  a  multigrid  algorithm.  An  alternative 
approach  to  Fourier  analysis  of  multigrid  methods  is  developed 
allowing  an  easy  generalization  to  non-homogeneous  high-order 
discretizations  and  facilitating  computer-aided  analysis. 

1.  INTRODUCTION 

Theoretical  analysis  of  multigrid  methods  should  be  used 
as  a  tool  to  predict  their  behavior  and  to  identify  possible 
problems.  Unfortunately,  rigorous  mathematical  analysis 
generally  yields  overly  pessimistic  bounds,  which,  in  addition, 
depend  on  unknown  constants  arising  from  elliptic  regularity 
[l-4,8-ll].  On  the  other  hand,  local  mode  analyses  give 
reasonably  close  estimates,  which  are  exact  for  certain  model 
problems  [l3] ,  but  can  fail  in  the  general  case.  A  hybrid 
approach  has  been  adopted  to  estimate  some  constants  from  the 
rigorous  theory  in  [lo] ,  yielding  reasonably  sharp  rigorous 
bounds  for  model  problems  and  multigrid  V-cycles,  generally 

Sponsored  in  part  by  the  Air  Force  Office  of  Scientific 
Research  under  Contract  No.  AF0SR-86-0126  and  the  Department 
of  Energy  under  grant  DE-AC03-84-ER80155. 
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used  in  practice.  To  our  present  knowledge,  there  is  no  simple 
way  to  obtain  h-independent  V-cycle  estimates  from  Fourier 
analysis  directly,  because  a  full  cycle  unavoidably  couples 
almost  all  frequencies. 

In  this  paper,  we  pursue  this  hybrid  approach  and  analyze 
the  multigrid  solution  of  the  linear  elasticity  problem  in 
three  dimensions  discretized  by  tensor  product  linear  finite 
elements.  The  complexity  involved  in  a  three-dimensional 
vector  problem  and  the  need  for  the  potential  to  extend  analysis 
to  quadratic  elements  led  us  to  an  alternative  approach  to 
Fourier  analysis  of  multigrid  methods,  which  offers  a  more 
systematic  analysis  and  simpler  programming.  The  basic  idea 
is  to  use  2h-modes  throughout  and  to  assign  separate  waves 
(but  with  the  same  frequency)  to  periodic  subsets  of  h-grid 
nodal  variables.  This  approach  can  be  easily  extended,  e.g., 
to  discretizations  with  midpoint  nodes,  which  will  be  studied 
elsewhere . 

The  paper  is  organized  as  follows:  In  Section  2,  we  state 
the  multigrid  algorithm  in  an  abstract  form  and  review  some 
theoretical  results.  Our  approach  to  Fourier  analysis  is 
illustrated  on  the  one-dimensional  Poisson  equation  in 
Section  3.  Section  4  contains  a  treatment  of  the  3-D  linear 
elasticity  problem  with  periodic  boundary  conditions. 

2.  THE  MULTIGRID  ALGORITHM  AND  CONVERGENCE  BOUNDS 

The  material  in  this  section  is  included  only  for 
reference.  For  details  and  extensions,  see  [9,10]. 
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Let  Hh,  h  =  h0,hQ/2, . . .  ,hm,  be  finite  dimensional  spaces 
equipped  with  inner  products  <*,*>h  and  norms  | |u| lh  =  <u,u>2, 
respectively,  and  dim  <  dim  H.  /0  <  .  .  .  <  dim  . 

h0  V2  hm 

These  spaces  are  linked  by  full-rank  linear  mappings 


I2h:  Hh  -*■  H2h,  called  restrictions,  and  I^h:  H9h  -*•  H 


2h  ’  2h 


called 


prolongations.  In  following  sections,  the  spaces  Hh, 
h  =  h0 > h0/2 > h0/4 ,  • . .  ,hm  =  hn2~m,  will  be  spaces  of  grid 


functions  with  characteristic  spacing  h.  The  theory  reviewed 
in  this  section,  however,  does  not  depend  on  this  particular 
interpretation.  Linear  mappings  Lh:  Hh  ■+■  Hh  are  given  and  we 
are  interested  in  a  fast  iterative  solution  of  the  problem 


Lh  uh  =  fh 


(2.1) 


for  h  =  hm.  The  problems  (2.1)  for  h  >  hm  are  auxiliary.  Let 


uh  *  Gh(uh’fh> 


be  a  consistent  iterative  method  for  the  solution  of  (2.1). 
For  simplicity,  we  restrict  ourselves  to  the  Richardson 
iteration 


uh  *  Gh^uh»fh)  =  uh  ~  ptLh)  (Lhuh  “  fh)- 

Let  u  >  0  and  y  >  0  be  fixed  integers.  A  simple  version 
of  the  multigrid  algorithm  uh  «-  MG^(uh,fh)  is  defined  as 
follows: 
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a.  If  h  = 

ho* 

then  uh  *  L-1  fh. 

b.  If  h  < 

ho» 

then : 

Step  It 

Perform  uh  «-  Gh(uh,fh)  y  times. 

Step  2: 

Set 

u2h  =  °*  f2h  =  Ihh(fh  "  Lhuh5  and  Perform 

u2h  •*  MGJhfu2h»f2h)  y  times* 


Step  3:  Perform  uh  ■*-  Gh(uh,fh)  Y  times. 

This  algorithm  is  called  V-cycle  for  y  =  1  and  W-cycle  for  y  =  2. 

We  shall  be  concerned  with  the  case  when  l_h  is  symmetric, 
positive  definite  and  the  following  variational  conditions 
hold: 


L2h 


-  (IkV. 


-2h 


-  I^*"1  L  I^1 
"  Ah  Lh  x2h» 


(2.2) 


where  *  denotes  the  adjoint  relative  to  <*,*>h  and  <*»*>2h* 
Then  it  turns  out  that  the  natural  norm  for  measuring  the 
convergence  factor  of  MG^  is  the  energy  norm  defined  by 


=  <LhWh’  uh  e  V 


Set 


T 


h 


jh  |^-1 

A2h  L2h 


(2.3) 


It  follows  from  (2.2)  that  Th  is  the  L^-orthogonal  projection 
onto  the  complement  of  the  range  of  I^.  Set 

»h  ■  <’<Lh)  BtTh  Lh1] 

(2.4) 

6  =  max  6.  . 

hm<h<hn 
m—  —  u 
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Then  the  results  of  the  theory  in  [8,9,10]  give  as  a  particular 
case  the  convergence  bound 

I  I  lu*  -  MG}j(uh,fh) !  |  |  <  e|  |  |uj  -  uhl  |  |  ,  for  any  uh  e  Hh, 
where  ujjj  =  L^1  fh,  and 

£  *  1»WI  lf  r>i,  u  >  i.  (2.5) 

For  other  bounds  based  on  the  quantity  5,  see  [8,10,ll].  This 
paper  will  be  mostly  concerned  with  the  numerical  computation 
of  the  quantity  S  by  means  of  Fourier  analysis  of  model  problems. 
It  should  be  noted  that  for  second  order  elliptic  boundary 
value  problems  and  usual  discretizations,  Sh  will  be  bounded 
independently  of  the  characteristic  mesh  size  h  if  the  boundary 
value  problem  is  H2-regular.  If  we  have  only  H1+a  regularity 
with  some  a  e  (0,1),  then  u  >  1  (i.e.,  the  W-cycle)  is 
required  in  order  to  bound  the  convergence  estimate  e  away 
from  one  independently  of  h  (at  least  in  all  current  theories). 
(For  a  discussion  of  this  rather  controversial  subject,  see 

W.) 

3.  A  ONE-DIMENSIONAL  PROLOGUE 

We  illustrate  the  main  ideas  on  a  simple  model  problem 

-u”  =  f  in  a  =  (0,b)  (3.1) 

with  periodic  boundary  conditions.  Let  n  >  0  be  even.  Let 
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h  =  Qh  =  txj!  xj  =  jh»  j  =  1,2 . n} 

and  define  Hh  as  the  space  of  grid  functions 

Hh  =  {uht  nh  -  c>. 

The  equation  (3.1)  is  discretized  by  central  differencing: 


Lh  uh 


with  fh(xj)  =  f(Xj)  and 


Lh  uh(Xj)  = 


•uh(*i  +  i)  +  2uh(xj)  -  ^hCx,1-l^ 


(3.2) 


where  we  set  by  periodicity  uh(xQ)  =  uh^xn^  and 

uh^xn+l^  =  uh^xl^*  It  is  easy  to  see  that  Lh  is  sin9uiar  and 
that  its  null  space  consists  of  constant  functions.  The  space 

Hh  is  equipped  with  the  inner  product 

n 

<uh*Vh  ■  h  jlj  uhCxj5  7h(xj>- 

We  may  thus  define  Hh  as  the  orthogonal  complement  of  the  null 
space  of  Lhs 

n 

Hh  =  {uh  e  ^h!  uh^xj^  =  0}» 


or,  equivalently,  as  the  factor  space  modulo  the  null  space 
of  Lh.  Replacing  everywhere  n  by  n/2  and  h  by  2h,  we  define 
similarly  ^h’  ^2h’  ^2h’  <*,*>2h>  and  L j ^ • 

OH 

operator  1^  is  given  by 


The  restriction 
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Jhh  =  iK^J  +  l5  +  2uh(xj5  +  Uh(Xj-l}]» 

Xj  =  jh  e  fl2h  (i.e.,  j  even).  (3.3) 

The  prolongation  operator  I2h  is  given  by  linear  interpolation, 
i.e. , 

J2h  u2h(xj)  =  u2h(xj^ 

for  j  even. 

X2h  u2h^x j  +  1  ^  =  T[u2h(xj)  +  u2h^xj+2^ 

Then  the  variational  conditions  (2.2)  are  satisfied. 

The  grid  functions  x  -*■  e*6^1"1  are  invariant  under  the 
operator  Lh: 

iexj/h  2(1  -  cos  9)  i6xj/h 

l.  e  =  2  e  » 

n  h/ 

by  (3.2)  and  Xj  =  jh  e  Periodical  boundary  conditions 

imply  that  0n  =  2kir,  k  e  Z. 

In  the  classical  approach  [ll,13],  it  turns  out  that 
only  the  modes  with  frequencies  e  and  0+ir  are  coupled  by  the 
coarse  grid  correction  operator  T^.  In  our  notation,  it  means 
that  the  span  of  the  functions  x  -*>  ei0x/h  an(j  x  e*(0+ir)x/h 
(for  a  fixed  0  =  2kir/n)  is  an  invariant  subspace  of  the 
coarse  grid  correction  operator  T^.  Thus  the  computation  of 
6  can  be  reduced  to  the  computation  of  spectral  radii  of  2x2 
matrices  over  the  range  <  0  <  j,  see  [lo]  .  Similarly,  one 
can  compute  other  characteristics  of  the  multigrid  process, 
such  as  the  spectral  radius  or  norm  of  the  2-grid  error 
transformation  operator 
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(I  - 


0) 

p(L 


*-h>T 


v*  - 


0) 

pCL. 


Lh)Y> 


cf.,  [13].  These  subspaces  are  also  invariant  under  some  other 
smoothers  such  as  Red-Black  relaxation,  see  [13]  for  an  analysis 
for  the  two-dimensional  Poisson  equation. 

Instead  of  complex  exponentials,  we  use  functions  defined 
in  the  following  ways  Define  the  grid  splitting  functions  4>k s 
flh  ♦  R,  k  =  0,1,  as  follows: 


*k(xj)  * 


1  if  j-k  =  0  mod  2, 
0  if  j-k  otherwise. 


Then  set 

i®X./h  OUm  *  — 

^k,e^xj^  55  ^k^x j^  e  »  0  =  ~ »  "2  <  0  -  2’ 

A 

The  functions  Q  form  an  orthogonal  basis  of  the  space  Hh. 
This  follows  immediately  from  the  fact  that  the  complex 
exponentials  x  -*■  e*6x>^h,  6  =  2kir/n,  -j  <  0  <  form  an 

A 

orthogonal  basis  of  H2h. 

For  a  fixed  6,  the  subspace  E0  =  span{i|»k  0:  k  =  0,1}  is 
an  invariant  subspace  of  Lh  as  well  as  of  T^ .  Indeed,  for  any 
stencil  operator  Ah  given  by 

\  “h(xj)  ■  «.i  uh(xJ-i)  *  a0  uh(xj’  *  al  uh<xJ.i>' 


we  have 


■  *T 
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(A 


A 

h  **.*««j>  -  ak  n,e(xj*k> 

i  iex1+k/h 

=  .  5  .  ak  C x j+k 5  e 


k=-l 


where 


,(0) 

’.M 


-  ei9Xj/h  b(0) 


I  ak  e 

k  =  -l 

A=j+k  mod  2 


(3.4) 


10  k 


(3.5) 


Consequently, 

Ah  ^H,e  ~  b0,£  ^0,8  +  bl,£  ^1,0  > 
so  the  2x2  matrix 

A  -  (b^h1 

A(0)  “  lbj,*Jj,*=o 

is  the  matrix  representation  of  the  reduction  of  Ah  to  E0.  For 
the  operator  L^,  given  by  (3.2),  we  get  the  matrix 


(e) 


2  /  1  -C0S  9\ 

^Lose  i  / 


r2h 


For  1^  ,  we  may  use  (3.4)  directly  with  even  numbered  Xj, 
The  functions 


Xj  e  «2h  *  e 


i0x j/h 


(3.6) 


4sf 


4t&4r. 
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with  the  same  range  of  0  as  above,  form  an  orthogonal  basis  of 

r\  u 

H2h.  So,  1^  maps  E0  into  the  span  of  the  complex  exponential 
function  (3.6)  with  the  corresponding  matrix 


I(0)  =  jU.cos  6). 


Thus  the  restriction  of  the  coarse  grid  correction  operator 
Th  given  by  (2.3)  to  the  subspace  Eg  preserve  E0.  Letting 
be  the  matrix  representation  of  this  restriction  with  respect 
to  the  basis  (<j>k  gi  k  =  0,1},  we  have,  by  (2.2), 

T(0)  L(0)  =  L(e)  -  Ice)Cl(e)  L(e)  ^o)3’1  x(0)' 


The  case  0=0  requires  special  attentions  then  the  inverses 
above  do  not  exist  and  one  has  to  recall  that  the  space  Hh  was 
defined  as  the  factor  space  modulo  constants.  Therefore,  we 
may  add  to  for  0=0  the  matrix 

(C  c  \ 

CE  =  I  ,  C  /  0, 


thus  replacing  above  by 

exist  and  the  result,  of  course, 
We  thus  have 


+  cE.  Then  the  inverses  do 
does  not  depend  on  c. 


*  =  PtLh)  P<Tt’  L;;I)  =  e=5ki/n‘>a<e))9=5kJ/n‘,tT<e> 


It  is  easy  to  see  that  the  subspace  coincides  with  the 
span  of  the  functions  x  -►  and  x  -►  ei(0+*)x/h^  used 


before.  So,  the  difference  in  our  approach  consists  in  using 
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a  special  basis  of  E0 .  A  similar  observation  can  be  made  in 
more  complicated  cases  as  well. 

4.  THE  3-D  ELASTICITY  PROBLEM 

Let  £1  =  (O.b^  x  (0,b2)  x  (0,bj).  Denote  by  H  the  space 
of  functions  £s  R3  -*■  C3  which  are  in  (hJ0C(R3)  )3  (i.e.,  the 
restriction  of  u  on  any  open  bounded  set  n’  is  in  (H1^’))3) 
and  which  are  b-periodics 

u(x)  =  u(x  +  kb)  (4.1) 


where 


x  +  kb  =  (x1+k1b1 ,x2+k2b2,x3+k3b3) , 
x,  =  (x^  jX2,Xj)  | 
k  =  (kj ,k2,kj)  e  Z3, 
b  =  (bj^  ,b2,bj) . 


Define  the  bilinear  form  a(*,*)  on  H  by 


a(u,v)  s  /  X(div  u)(div  w)  +  2y 


i,3=l 


j (u)  ejjCv), 


(4.2) 


where 


and  X  >  0,  y  >  0  are  the  Lame  elasticity  coefficients,  cf., 
e.g.  [12] . 
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A  simple  3-D  linear  elasticity  problem  in  n  with  periodic 
boundary  conditions  that  we  consider  here  is  then  the 
variational  problem  of  finding  the  displacement  vector  field 
u  satisfying 

u  e  Ht  a(u,vj  =  f ( v) ,  for  all  v  e  ff,  (4.3) 

where  the  rignt  hand  side  functional  is  given  by 
3 

f(v)  =  /  J  f,  vt. 

a  i=l 

Here,  f^  are  (real)  given  body  forces  satisfying  the  equilibrium 
conditions 

/  f.  =  0,  i  =  1,2,3. 

S2  1 

Equation  (4.3)  expresses  the  minimum  conditions  for  the  energy 
functional  J(u)  =  a(u,u)  -  f (u) ,  u  real. 

Problem  (4.3)  has  a  solution  which  is  unique  up  to  a 
constant  in  each  direction:  if  u  =  (ulfu2, u^)  solves  (4.3), 
then  u+x,  v  constant,  is  also  a  solution.  (Note  that  the 
remaining  components  of  the  null  space,  cf.  [l2,  p.  9l] ,  are 
eliminated  by  periodic  boundary  conditions.) 

Letn=(n1,n2,n3)eZ3,  n^>0  even,  and  h  =  (h1,h2,h;J), 
hj  =  b./n^.  Set 

ah  =  1  i  1  i  n)* 

with 
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2L^  =  jit  =  '  J-  e  2  * 

(Inequalities  are  to  be  understood  by  components.)  Set 

Vh  =  {u:  Qh  C},  Hh  =  (u  =  (u1,u2,u3)s  us  e  Vh>, 

and  every  u  e  is  considered  to  be  extended  by  periodicity 
(4.1)  to  all  points  in  R3  of  the  form  j£,  e  Z3. 

The  space  Hh  is  equipped  with  the  inner  product 

3  _ 

<uh,vh>h  =  hxh2h3  J  J  uh(*k)  vh^xk^*  (4. 4) 

- -  s=l  xkeRh  -  -  -  - 

A  discretization  of  problem  (4.2)  will  be  written  as 


— h 


(4.5) 


Setting  t=l»  %  =  ^uh^t  =  l’  -h  =  (fh^s=l *  we 

may  write  (4.5)  as 


kl 

f'S) 

- 

* 

•H 

CM  -C| 

<-? 

• 

i 

= 

U1 

L” 
ii  J 

The  linear  operators  s  Vh  -*•  Vh  will  be  given  by  stencils 
of  the  form 


]. 


which  means  that 


Lh'  u£<ik> 
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j[  Uh^— k+— ^ »  ik  e  nh*  (4.6) 


We  choose  to  determine  the  stencil  coefficients  from  the 
variational  formulation  (4.3)  in  a  way  equivalent  to  the  use 
of  multilinear  finite  elements  (other  ways  are  possible). 
Define  the  multilinear  basis  functions 

12  3 

i.  x  “k ,  h i  x  x  -k,h, 

<5<S>  *  xt— 

where  x(t)  =  max{0 , 1- 1 1 1 } .  We  may  then  define  the  multilinear 
interpolation  operator  Ph:  Hh  -►  h  by 


[ph%1(2>  3  l  «s. 

Hh  =  (ulfu2,u3)  e  Hh,  x  e  R3, 

where  ti  is  the  s-th  coordinate  vector  in  R3. 

The  discretization  of  (4.3)  is  now  given  by 

%  £  V  a(ph%'ph^  =  f(PhV’  f0r  311  — h  e  V 

We  get  the  right  hand  side  f^  from 

f(FW  -  <Ih^h>h'  for  811  *h  e  V 


fh^ik^  =  £s^hlh2h3' 
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The  discrete  operator  Lh  Is  determined  by 

a(PhHh«Vh)  =  <Lh— h*— for  a11  e  fih*  (4‘7) 

which  gives  the  stencil  coefficients 

tk‘  =  a  t!  fes  ’  {E+^St ) /h  1  h2h  3 

(independently  on  the  choice  of  m  =  (n^  ,m2 ,m.j) ) . 

Replacing  n  by  n/2  and  h  by  2h,  we  obtain  similarly  n2h, 
H2h,  and  the  discretization  in  H2h  (in  the  stencils,  the  index 
£  was  omitted  to  simplify  notation).  The  prolongation  operator 
is  defined  by  interpolations  I^h  u2h(jK^)  =  P2h 

Inner  product  <*,*>2h  *s  defined  on  H2h  analogously  to 
(4.4).  The  restriction  operator  ij^-s  Hh  H2h  is  defined  by 
the  transposes  1^—  *  (1^)’  relative  to  the  inner  products 
<*,*>h  and  <*»*>2h*  The  restriction  stencil  is  then  given  by 

I?  -  [  r ] .  tf-o  If  s  4  t, 

fj,  =  ras  =  |(l-|k1l)tl-lk2l)(l-|kJ|),  -1  <  k  <  1. 

It  means  that 

:h-  *  _1|ksl  rk  Hh'ij  *  *  V 

The  null  space  of  consists  of  constant  vectorss 

Kh  =  {u  e  Hh*  u(x^)  =  u(xk)  for  a11  e  nh}’ 

Similarly,  we  define  K2h  c  *2h*  Then  these  null  spaces  are 
Invariant  under  our  operatorss 
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Lh  Kh  c  Kh’  IIh  K2h  =  Kh*  Xh~  Kh  =  K2h* 

We  may  thus  pass  to  the  factor  spaces 

Hh  =  S/Kh»  H2h  =  ^2h'/K2h* 

All  operators  under  consideration  induce  operators  between 
these  factor  spaces  and  we  keep  the  same  notation  for  induced 
operators.  Also,  we  will  not  distinguish  between  an  element 

A  • 

uh  e  Hh  and  the  class  Hh  +  Kh  e  Hh  there  is  no  danger 
of  confusion. 

Having  established  the  discrete  problem,  we  may  now  proceed 
to  the  Fourier  analysis. 

For  0  <  k  <  1 ,  define  the  grid  splitting  function 
1  if  k  =  1  mod  2 

♦..(x,)  =  (A. 8) 

—  **■  0  otherwise. 


Then  define  the  basis  functions 


^k,0_(~)  =  -s  ^k^  ei8  ^  »  -  e  nh’ 


s  =  1,2,3,  0  <  k  <  1, 


(4.9) 


with  e„  the  s-th  coordinate  vector  in  R3  and 


— s 


^1^1  ®2X2  ® 3X3 


9-x/h  =  — r +  — p— 

nl  n2 


The  periodical  boundary  conditions  imply  for  each  component  of 
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£  that  0j n j  =  2kjir,  kj  e  Z,  so 
£  =  2irk/n,  k  e  T? . 

Restricting  £  to  an  interval  of  length  ir,  say 

T  -  £ 

A 

we  get  an  orthogonal  basis  of  the  space  consisting  of 
functions  ^  0 . 

Set 

E0  =  span{i|>j^0:  s  =  1,2,3,  0  <  k  <  1}  c  Hh* 

Dimension  of  EQ  as  a  subspace  of  Hh  is  3  x  23  =  24  for  £  /  0 
and  24  -  3  =  21  for  £=  0,  because  Kh  c  e0  for  £=  0.  We  have 
for  the  operator  Lh  given  by  (4.6)-(4.7)  similarly  as  in  C3.4) 
that 


(Lh 


Y  Y 

t  =  l  -linutl 


St 


ts 


b^C£) 


i£* x./£ 
e  J 


where 


« thX. 
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b]£(e) 


-lentil 
J_+m=k~m°d  2 


^  ts  ei6_*m 
m 


i , 

-  lim.il 
m=k-j~ mod  2 


cos  m»0  , 
m  —  —  ’ 


(4.10) 


using  the  symmetry 


£  t  S  _  £  t  s 
m  =  -m' 


(4.11) 


which  follows  from  the  fact  that  the  bilinear  form  a  is 
symmetric.  Consequently,  decomposing  the  function  e*— — , 
we  get 


.iui  *i.t  bjSt4)- 


lijil 

We  can  thus  represent  the  reduction  of  Lh  onto  the  subspace  E0 
by  the  3x3  block  matrix  with  8x8  blocks:  each  block 
corresponds  to  the  indices  s,t  =  1,2,3,  and  the  ordering 
within  blocks  may  be  chosen  as  the  lexicographical  ordering 
of  the  vector  indices  0  £  ^  £  1,  0  <  k  <  h 


■(e) 


( hn 

bjk 

hi2 

bik 

b13  ' 

bii< 

h21 

h22 

h23 

bik 

bik 

bik 

h31 

h32 

h33 

\bJk 

bjk 

bik  ) 

•  b]k  -  b]kW  • 
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Similarly,  we  have  for  the  restriction  operator  that 


(If  *1,6  Hij)  -  es  r. 


i£*  x../h 

=  c^(0)  e  »  6  n2h»  s  =  1»2*3* 


with 


c.(6.)  =  I  r  cos  0  *m  .  (A. 12) 

-  -Izrail  - 

m=k  mod  2 


So,  1^-  maps  Eq  into  the  span  of  the  functions  x  -*•  es 

A 

s  =  1,2,3,  in  H2h,  with  the  3  x  2A  matrix  representation 


Because  the  bases  are  orthogonal  and  normalized,  we  may 
represent  I^h  by  the  transposed  24  x  3  matrix  l[0)*  So,  we 
can  represent  the  restriction  of  Th  L^1  (cf.,  (2.3))  to  E0  by 
the  24  x  24  matrix 


’  I( 0) C I(0) 


IT  )-1 


(4.13) 


For  j0  =  (0,0,0),  however,  one  must  remember  that  L^0^  turns 
singular  and  that  we  compute  in  the  factor  space  modulo  the 
null  space  Kh  of  Lh.  Since  «h  consists  of  constant  functions, 
all  functions  of  the  form 
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»  6  =  (0,0,0),  S  =  1,2,3, 

1  JS*Z 

are  equivalent  to  the  zero  function.  We  may  thus  replace  La 
in  (4.13)  by  LQ  +  cEj,  c  >  0,  Ej  the  3x3  block  diagonal 
matrix  with  8x8  diagonal  blocks  of  all  ones.  This  does 
not  change  the  result  in  the  factor  space  and  the  inverses 
exist  in  the  usual  sense. 

This  reduction  to  the  subspaces  EQ  implies  that 

p(Lh)  =  max{p(L(0)):  £  <  0_  <  J,  0  =  2kir/n,  k  e  Z3} 

(4.14) 

and 

p(Th  L"1)  =  max{p(T^ej  L^)t  J  <  0  <  0  =  2kir/n ,  k  e  Z3} 

(4.15) 

which  allows  us  to  compute  the  quantity  6h  =  p(L-h)  p(T^  L”1) 
numerically. 

Our  numerical  results  are  summarized  in  Table  1. 

As  usual  in  engineering  practice,  we  have  expressed  the 
Lame  coefficients  in  (4.2)  from  Young’s  modulus  E  and  Poisson’s 
ratio  a  by 

M  =  2(i+o)  ’  X  =  (i+Tni-Sa)  ’ 

cf.,  [12].  Because  E  here  affects  only  the  scale  of  the 
operator  L^,  the  results  depend  on  o  only. 
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TABLE  1 

The  Quantity  6  as  a  Function  of  the  Ratio  of  h^ 
and  of  Poisson's  Ratio  a 


0 

^1 s  ^2 s  ^3 

.10 

.20 

.30 

.40 

1:1:1 

4.76 

5.33 

7.00 

12.00 

1:1:2 

18.00 

21.33 

28.00 

48.00 

1:1:3 

40.50 

48.00 

63.00 

108.00 

1:2:3 

40.50 

48.00 

63.00 

108.00 

The  values  of  6  in  Table  1  are  valid  for  all  values  of 
n,  because  the  maxima  were  attained  at  e_  =  0  for  pCL^),  and 
at  01  =  02  =  0,  03  /  0  arbitrary,  for  p(T^  L(e)^*  Howeveri  this 
is  true  only  for  h^h^h-j  and  o  within  the  range  of  Table  1;  in 
the  general  case,  the  maxima  may  be — and  we  observed  that  in 
some  cases  are — attained  at  other  points  which  change  with  £• 
Finally  note  that  typical  values  of  Poisson’s  ratio  a  are 
around  0.30  for  most  metals.  For  a  =  0.30  and  h2!h2!h3  =  1:1:1, 
we  have  6  =  7.  Then  (2.5)  gives  the  following  V-cycle  convergence 
bound  for  y  steps  of  Richardson’s  iteration  with  u  =  f  as  a 
pre-smoother  as  well  as  a  post-smoother: 


l+y*3/7  * 


This  yields  the  bounds  e  =  0.7  for  y  =  l  and  e«0.54  for  y  =  2. 
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ABSTRACT 

A  multigrid  algorithm  has  been  developed  for  solving  the  steady-state  Euler  equations  in  two 
dimensions  on  unstructured  triangular  meshes.  The  method  assumes  the  various  coarse  and 
fine  grids  of  the  multigrid  sequence  to  be  independent  of  one  another,  thus  decoupling  the  grid 
generation  procedure  from  the  multigrid  algorithm.  The  transfer  of  variables  between  the  vari¬ 
ous  meshes  employs  a  tree-search  algorithm  which  rapidly  identifies  regions  of  overlap 
between  coarse  and  fine  grid  cells.  Finer  meshes  are  obtained  either  by  regenerating  new  glo¬ 
bally  refined  meshes,  or  by  adaptively  refining  the  previous  coarser  mesh.  For  both  cases,  the 
observed  convergence  rates  are  comparable  to  those  obtained  with  structured  multigrid  Euler 
solvers.  The  adaptively  generated  meshes  are  shown  to  produce  solutions  of  higher  accuracy 
with  fewer  mesh  points. 


1.  INTRODUCTION 

The  ability  to  predict  flow  patterns  and  aerodynamic  forces  about  complex  configurations 
in  the  transonic  regime  is  of  primary  importance  to  the  aircraft  designer.  For  slender  bodies  at 
small  angles  of  attack,  the  flow  remains  attached,  and  the  effect  of  viscosity  is  confined  to  rela¬ 
tively  small  boundary-layer  and  wake  regions.  Thus,  an  accurate  description  of  the  flow  can 
be  achieved  using  the  inviscid  Euler  equations.  These  represent  a  system  of  non-linear  partial 
differential  equations  in  space  and  time. 

Steady-state  solutions  of  the  Euler  equations  about  simple  geometries  in  two  and  three 
dimensions  have  become  fairly  widespread  over  the  past  few  years.  However,  for  more  com¬ 
plex  geometries,  the  generation  of  suitable  meshes  remains  an  obstacle.  One  approach  which 
has  recently  received  increased  attention  in  the  literature  is  the  use  of  unstructured  triangular  or 
tetrahedral  meshes  in  two  or  three  dimensions  respectively  [1,2].  The  advantages  of  unstruc¬ 
tured  meshes  are  two-fold.  Firstly,  they  provide  a  means  for  generating  meshes  about  arbi¬ 
trarily  complex  configurations.  Secondly,  they  provide  a  natural  setting  for  the  use  of  adaptive 
meshing  techniques,  where  local  flow  properties  or  error  estimates  are  used  to  determine  the 
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distribution  of  mesh  nodes.  Because  adaptive  mesh  refinement  is  a  procedure  which  generally 
destroys  the  structure  of  an  existing  mesh,  its  implementation  has  often  been  constrained  by  the 
need  to  preserve  the  mesh  structure.  This  has  led  to  block  structured  composite  meshes,  where 
zonal  regions  are  refined  uniformly  to  preserve  structure  [3].  With  unstructured  meshes,  these 
constraints  are  removed,  and  much  more  effective  refinement  strategies  may  be  devised  to 
develop  "optimum"  meshes. 

On  the  other  hand,  unstructured  mesh  flow  solvers  are  generally  much  less  efficient  than 
available  structured  mesh  solvers.  Unstructured  mesh  solvers  suffer  from  inherent  limitations, 
such  as  the  need  to  store  the  mesh  connectivity,  and  the  use  of  gather-scatter  operations  on 
vector  computers.  However,  it  is  also  evident  that  the  development  of  unstructured  mesh  flow 
solvers  has  not  kept  pace  with  advances  in  structured  mesh  solvers.  Many  of  the  ideas 
developed  for  structured  mesh  solvers,  such  as  approximate  factorization  and  nested  multigrid 
methods,  cannot  be  applied  to  unstructured  meshes.  They  must  either  be  modified  or  aban¬ 
doned  in  favor  of  more  general  algorithms. 

In  this  work,  a  multigrid  algorithm  for  unstructured  meshes  is  presented.  The  algorithm 
operates  on  a  sequence  of  coarse  and  fine  meshes  and  assumes  no  relation  exists  between  the 
various  meshes  of  the  sequence.  The  meshes  are  generated  by  triangulating  a  given  set  of 
points  in  the  flow-field  using  the  Delaunay  triangulation  algorithm  [4],  The  distribution  of 
mesh  points  is  either  determined  by  conformal  mapping  techniques,  or  by  adaptive  refinement 
of  the  previous  coarser  grid.  The  decision  to  adopt  a  multigrid  strategy  involving  a  sequence  of 
unrelated  meshes  was  motivated  by  the  desire  to  optimize  both  the  accuracy  and  the  efficiency 
of  the  solver.  This  type  of  approach  has  previously  been  attempted  by  Lohner  and  Morgan  [5] 
for  elliptic  problems.  Other  approaches  [6]  have  suggested  using  a  sequence  of  unstructured 
nested  meshes,  where  finer  meshes  are  constructed  by  successively  subdividing  the  cells  of  a 
coarse  unstructured  grid  in  some  manner.  However,  for  a  multigrid  algorithm,  the  accuracy  of 
the  solution  is  determined  uniquely  by  the  finest  grid,  whereas  the  convergence  rate  is  deter¬ 
mined  by  the  coarsest  grid  of  the  sequence.  The  present  approach  provides  the  maximum  flexi¬ 
bility  for  determining  the  configuration  of  the  coarse  and  fine  grids  of  the  sequence,  thus 
optimizing  the  efficiency  and  accuracy  of  the  solver.  Furthermore,  when  adaptive  meshing 
techniques  are  employed,  sequences  of  nested  meshes  can  be  obtained  only  by  resorting  to 
local  mesh  enrichment.  However,  much  more  sophisticated  adaptive  techniques  are  presently 
being  advocated  in  the  literature,  mainly  in  the  interest  of  obtaining  directional  refinements  and 
smoothly  varying  meshes.  These  include  a  combination  of  mesh  enrichment  and  moving 
meshes  [7],  and  complete  remeshing  using  coarse  grid  flow  variables  as  weighting  functions 
[8],  The  present  multigrid  strategy  can  be  used  in  conjunction  with  any  of  these  techniques. 


2.  DISCRETIZATION  OF  THE  GOVERNING  EQUATIONS 

The  variables  to  be  determined  are  the  pressure,  density,  Cartesian  velocity  components, 
total  energy  and  total  enthalpy  denoted  by  p,  p,  u,  v,  E,  and  H,  respectively.  Since  for  a  perfect 
gas  we  have 


E  = 


P 

(Y-l)P 


IT2  +  V2 

+  — r — 


p 

H  =  E  +  — 
P 


where  y  is  the  ratio  of  specific  heats,  we  need  only  solve  for  the  four  variables  p,  pir,  pv,  and 

p£. 
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These  values  are  determined  by  solving  the  Euler  equations,  which  in  integral  form  read: 

dxdy  +  j^fdy  -  gdx )  =  0 

where  Cl  is  a  fixed  area  with  boundary  3fl,  x  and  y  are  Cartesian  coordinates,  and 


w  = 

p 

p« 

pv 

/= 

>  ' 

pu 

pu2  +  p 
puv 

g  = 

pv 

pvu 

P  v2  +  P 

puH 

pvH 

The  w  variables  are  stored  at  the  vertices  of  each  triangle.  The  control  volume  for  vertex  i  is 
defined  as  the  union  of  all  triangles  having  a  vertex  at  i,  as  shown  in  Figure  1.  The  boundary 
flux  integral  in  equation  (1)  is  approximated  by  first  calculating  the  values  of  the  fluxes  f  and  g 
at  the  nodes  on  the  outer  boundary  of  this  control  volume.  These  can  then  be  integrated  about 
the  control  volume  boundary  by  assuming  that  on  each  edge,  the  value  of  the  flux  can  be  taken 
as  the  average  of  the  two  values  on  either  end  of  the  edge.  This  finite-volume  formulation  can 
be  shown  to  be  equivalent  to  a  Galerkin  finite-element  approximation,  with  a  lumped  mass 
matrix,  and  is  second-order  accurate  in  space  [1]. 

Additional  dissipative  terms  are  needed  to  prevent  odd-even  point  decoupling,  and  to 
prevent  the  formation  of  numerical  oscillations  near  a  shock.  Artificial  dissipation  terms  are 
constructed  as  a  blend  of  second  and  fourth  differences  in  the  flow  variables,  where  the 
differences  are  taken  along  each  edge  of  the  mesh.  Thus,  for  example,  the  second  differences 
of  w  at  node  i  are  calculated  as 

V2*,  =  2(w,  -  w*)  (2) 

where  n  is  the  number  of  edges  meeting  at  node  i,  and  wk  represents  the  value  of  w  at  the 
other  end  of  each  edge  (cf.  Figure  1).  Fourth  differences  are  constructed  by  first  computing 
and  storing  the  second  differences,  as  shown  above,  and  then  differencing  these  values  again. 
This  can  be  achieved  by  replacing  the  flow  variables  in  equation  (2)  with  the  previously  calcu¬ 
lated  second  differences.  The  fourth  difference  terms  form  the  background  dissipation,  which 
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is  applied  throughout  the  flow-field.  These  terms  can  be  shown  to  be  third-order  accurate,  and 
thus  the  second  order-accuracy  of  the  scheme  is  preserved.  The  second  differences  represent 
stronger  first  order  dissipation  which  is  needed  to  prevent  oscillations  near  shocks.  Because  this 
strong  dissipation  compromises  the  accuracy  of  the  scheme,  it  is  applied  only  in  the  vicinity  of 
a  shock,  and  is  turned  off  elsewhere.  This  behavior  is  achieved  by  multiplying  the  second 
differences  by  an  adaptive  coefficient,  constructed  as  a  second  difference  in  the  pressure.  This 
coefficient  assumes  a  small  value  (order  At2)  in  regions  of  smooth  flow,  and  becomes  of 
order  1  near  a  shock.  The  present  formulation  of  the  dissipative  terms  is  analogous  to  that  used 
by  Jameson  for  structured  quadrilateral  meshes  [9],  and  provides  a  scheme  which  is  second- 
order  accurate  everywhere,  except  in  the  vicinity  of  a  shock  where  it  becomes  locally  first- 
order  accurate. 

3.  INTEGRATION  TO  A  STEADY-STATE 

Discretization  of  the  Euler  equations  in  space  transforms  the  governing  equations  into  a 
set  of  coupled  ordinary-differential  equations  which  must  be  integrated  in  time  to  obtain  the 
steady-state  solution.  Thus,  equation  (1)  becomes  the  set 

dW' 

S,~r  +  [GW  -  -  0.  r=U3,... 

at 

where  Si  is  the  area  of  the  control  volume  i,  and  is  independent  of  time.  The  convective  opera¬ 
tor  Q(w)  represents  the  discrete  approximation  to  the  flux  integral  in  (1),  and  the  dissipative 
operator  D(w)  represents  the  artificial  dissipation  terms.  These  equations  are  integrated  in  time 
using  a  fully  explicit  5-stage  hybrid  time-stepping  scheme,  where  the  operator  Q(w)  is 
evaluated  at  each  stage  in  the  time  step,  and  the  operator  D(w)  is  only  evaluated  in  the  first 
two  stages,  and  then  frozen  at  that  value.  Thus  we  advance  in  time  as 


w(0>  _  yf 

w(1)  =  w<0)  -  Oj-^r 

<2(w<0))  -  D(yJ®) 

W®  =W<°>-  or^- 

u 

fi(w(1))  - 

w®  =  w(0)  -  o3y 

g(w<2))  -  D(w<0) 

W<4>  =  w<°>  -  0*4““ 

u 

2(w(3>)  - 

=  w<°>-  Ojy 
w**1  =  w® 

fi(w<4))  -  D{^X)) 

where  w"  and  w"*1  are  the  values  at  the  beginning  and  the  end  of  the  nth  time  step.  The  stan¬ 
dard  values  of  the  coefficients  are 

cxi  =  1/4  (*2  =  1/6  03  =  3/8  ct,  =  1/2  cts  =  1 

This  scheme  represents  a  particular  case  of  a  large  class  of  hybrid  time-stepping  schemes, 
which  has  been  specifically  designed  to  produce  strong  damping  characteristics  of  high  fre¬ 
quency  error  modes.  It  is  thus  well  suited  to  drive  the  multigrid  algorithm. 

Convergence  to  a  steady-state  is  also  accelerated  by  using  the  maximum  permissible  time 
step  at  each  point  in  the  flow-field,  as  determined  by  local  stability  analysis,  by  the  use  of 
enthalpy  damping  [9],  and  implicit  residual  averaging  [10]. 
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Figure  2 

Grid  Transfers  for  the  Unstructured  Multigrid  Algorithm: 
Residual  at  "a"  is  distributed  to  A,  B,  and  C 
Flow  Variable  at  P  is  the  Linear  Interpolation 
of  Values  at  1,  2,  and  3. 


4.  THE  FULL  MULTIGRID  ALGORITHM 

The  basic  idea  of  a  multigrid  strategy  is  to  perform  time  steps  on  coarser  meshes  to  cal¬ 
culate  corrections  to  a  solution  on  a  finer  mesh.  The  advantages  of  time-stepping  on  coarse 
meshes  are  two-fold:  first,  the  permissible  time  step  is  much  larger,  since  it  is  proportional  to 
the  mesh  width,  and  secondly  the  work  involved  is  much  less  because  of  the  smaller  number 
of  grid  points.  In  order  to  combine,  without  compromise,  the  advantages  of  unstructured 
meshes  with  those  of  a  multigrid  strategy,  it  proves  convenient  to  decouple  the  grid  generation 
procedure  from  the  multigrid  algorithm.  Thus,  a  multigrid  method  which  operates  on  a 
sequence  of  unrelated  meshes  is  needed.  The  key  to  such  a  strategy  is  the  efficient  transfer  of 
flow  variables  back  and  forth  between  these  meshes. 

The  full  multigrid  algorithm  begins  by  computing  the  solution  to  the  problem  at  hand  on 
a  coarse  mesh.  When  convergence  has  been  reached,  a  new  finer  mesh  is  generated.  This  can 
either  be  performed  by  globally  regenerating  a  new  mesh  with  a  higher  density  of  mesh  points 
in  all  regions  of  the  flow-field,  or  by  adaptively  refining  the  existing  mesh.  Next,  the  patterns 
for  transferring  the  flow  variables  back  and  forth  between  these  two  meshes  must  be  deter¬ 
mined.  Since  the  meshes  are  unnested,  this  is  a  non-trivial  task.  It  is  performed  using  a  tree- 
search  algorithm,  which  is  described  in  detail  in  a  following  section.  For  any  given  flow 
calculation,  this  operation  is  only  performed  once,  immediately  after  the  generation  of  the  new 
mesh.  Transfer  coefficients  and  transfer  addresses  are  computed  and  stored,  and  us^d  subse¬ 
quently  in  the  flow  calculations.  For  each  fine  mesh  point,  three  transfer  addresses  determine 
the  three  coarse  grid  nodes  of  the  cell  enclosing  the  fine  grid  node,  to  which  the  variables  are 
to  be  transferred  (see  Figure  2),  and  the  weighting  is  given  by  the  corresponding  transfer 
coefficients.  The  flow  variables  are  then  transferred  to  the  new  fine  mesh,  and  these  serve  as 
the  initial  conditions  for  time  stepping  on  this  mesh.  A  multigrid  saw-tooth  cycle  is  then  used 
to  solve  the  equations  on  the  new  finer  mesh,  using  the  previous  mesh  as  the  background 
coarse  grid.  When  convergence  is  obtained,  a  third  finer  mesh  is  generated,  the  transfer  pat¬ 
terns  are  determined,  and  the  flow  variables  are  transferred  to  the  new  mesh.  Time  stepping 
resumes  on  this  mesh  using  all  three  meshes  as  a  sequence  in  the  multigrid  saw-tooth  cycle. 
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This  procedure  can  be  repeated  as  many  times  as  necessary  to  obtain  the  desired  accuracy, 
each  time  adding  another  mesh  to  the  multigrid  sequence.  The  full  multigrid  algorithm  for  a 
sequence  of  four  meshes,  beginning  on  the  second  mesh  of  the  sequence,  is  depicted  in  Fig¬ 
ure  3. 

4.1.  Multigrid  Saw-Tooth  Cycle 

For  a  given  sequence  of  meshes,  the  multigrid  saw-tooth  cycle  initiated  by  performing 
a  single  time  step  on  the  finest  mesh  of  the  sequence.  The  flow  variables  and  residuals  are  then 
transferred  to  the  next  coarser  grid.  The  equations  on  the  coarse  grids  must  be  modified  to 
ensure  that  they  represent  the  fine  grid  solution.  If  R'  represents  the  transferred  residuals  and  w’ 
the  transferred  flow  variables,  a  forcing  function  on  the  coarse  grid  may  be  defined  as 

P  =  R'  -  R(wr). 

Now,  on  the  coarse  grids,  time  stepping  proceeds  as 

H,(«)  =  w(o)  _  a^(  /?(*<*-' >)  +  P) 

v 

for  the  qth  stage.  In  the  first  stage,  reduces  to  the  transferred  flow  variable  w\  Thus,  the 
calculated  residuals  on  the  coarse  grid  are  canceled  by  the  second  term  in  the  forcing  function 
P,  leaving  only  the  R'  term.  This  indicates  that  the  coarse  grid  solution  is  driven  by  the  fine 
grid  residuals.  This  procedure  is  repeated  on  successively  coarser  grids,  performing  one  time 
step  on  each  grid  level.  When  the  coarsest  grid  is  reached,  the  corrections  are  transferred  back 
to  the  finer  grids  without  any  intermediate  time  stepping. 

42.  Grid  Transfers 

Flow  variables,  residuals,  and  corrections  are  transferred  between  coarse  and  fine  grids  in 
different  manneis.  Flow  variables  at  a  coarse  grid  node  P  are  taken  as  the  linear  interpolation 
of  the  corresponding  values  at  nodes  1,  2,  and  3,  as  shown  in  Figure  2,  which  are  the  vertices 
of  the  fine  grid  triangle  enclosing  P.  These  three  nodes  include  the  fine  grid  node  which  is 
closest  to  P,  thus  ensuring  an  accurate  representation  of  the  flow-field  on  the  coarse  grid.  The 
fine  grid  residual  Ra  at  "a"  in  Figure  2  is  linearly  distributed  to  the  coarse  grid  nodes  A,  B,  and 
C,  which  are  the  vertices  of  the  coarse  grid  triangle  enclosing  ”a”.  This  linear  distribution  is 
accomplished  by  the  use  of  shape  functions  which  have  the  value  1  at  one  of  the  coarse  grid 


Grid  4 

Grid  3 

Grid  2 

Grid  1 
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Figure  3 

Full  Multigrid  Algorithm  using  the  Saw-Tooth  Cycle 
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triangle  vertices,  and  vanish  at  the  other  two  vertices.  This  implies  that  the  sum  of  the  residual 
contribution  to  A,  B,  and  C  equals  the  residual  at  "a",  and  the  weighting  is  such  that,  if  "a” 
and  A  coincide,  then  the  contribution  at  A  is  equal  to  Ra,  and  the  contributions  at  B  and  C  van¬ 
ish.  This  type  of  transfer  is  conservative.  When  transferring  the  corrections  from  the  coarse 
grid  back  to  the  fine  grid,  a  simple  linear  inteipolation  formula  is  used.  Thus,  the  correction  at 
the  fine  grid  node  "a”  is  taken  as  the  linear  interpolation  of  the  three  corrections  at  nodes  A,  B, 
and  C  which  enclose  "a”  on  the  coarse  grid. 

43.  Search  Algorithm 

The  remaining  difficulty  lies  in  the  determination  of  the  nodes  A,  B,  and  C  to  be  associ¬ 
ated  with  each  fine  grid  node  "a".  This  is  equivalent  to  the  problem  of  locating  the  address  of 
the  coarse  grid  cell  which  encloses  a  particular  fine  grid  node.  A  naive  search  over  all  the 
coarse  grid  cells  would  require  OfA/2)  operations,  where  N  is  the  number  of  grid  points,  and 
thus  would  be  prohibitively  expensive,  requiring  more  time  than  the  flow  solution  itself.  Hence, 
an  efficient  search  algorithm  is  needed.  In  this  work,  a  tree-search  algorithm  has  been  adopted. 
It  requires  that  information  about  the  neighbors  of  each  node  or  cell  be  stored  for  both  the 
coarse  and  fine  grids.  It  is  initiated  by  providing  an  initial  guess  ICi  for  the  coarse  grid  cell, 
and  then  testing  ICX  to  see  if  it  encloses  the  fine  grid  node  NF.  Since  we  are  free  to  begin  the 
search  with  any  fine  grid  node  and  any  coarse  grid  cell,  we  choose  points  whose  locations  are 
known  (such  as  trailing  edge  values).  If  the  test  is  negative,  then  the  neighbors  of  1C\  are 
tested.  If  these  test  also  fail,  then  the  neighbors  of  these  neighbors  are  tested.  This  process  is 
continued  until,  after  n  tries,  the  address  IC„  of  the  cell  enclosing  NF  is  located.  This  entire 
procedure  is  repeated  for  every  node  of  the  fine  grid.  The  next  fine  grid  node  NF2  is  thus 
chosen  as  a  neighbor  of  NF,  and  the  initial  guess  for  the  enclosing  cell  is  taken  as  /C,,  the 
coarse  grid  cell  which  is  now  known  to  enclose  the  previous  NF.  In  this  manner,  we  are 
assured  of  a  good  initial  guess,  since  /C„  and  NF2  must  be  located  in  the  same  region  of  the 
computational  domain.  This  type  of  search  can  be  achieved  in  0(N  logN)  operations.  In  prac¬ 
tice,  of  the  order  of  10  searches  are  required  to  locate  an  enclosing  cell.  Furthermore,  this 
value  is  found  to  be  insensitive  to  the  size  of  the  mesh.  Because  this  operation  is  performed 
only  once,  just  after  the  generation  of  the  new  mesh,  the  total  amount  of  work  involved  is 
negligible  when  compared  with  the  flow  solution  phase. 

5.  MESH  GENERATION 

Since  the  unstructured  multigrid  algorithm  assumes  the  coarse  and  fine  meshes  of  the 
multigrid  sequence  are  independent  of  one  another,  any  suitable  mesh  generation  scheme  may 
be  employed.  In  this  work,  two  approaches  are  illustrated,  one  where  the  global  mesh  point 
distribution  is  determined  by  conformal  mapping  techniques,  and  one  where  the  mesh  point 
distribution  is  determined  by  adaptive  refinement  techniques. 

For  both  cases,  the  generation  of  unstructured  triangular  meshes  is  accomplished  in  three 
independent  steps.  First  a  distribution  of  mesh  points  in  the  flow-field  is  determined.  These 
points  are  then  joined  together  by  line  segments  to  form  a  set  of  triangular  elements  using  the 
Delaunay  triangulation  algorithm.  There  exist  many  ways  of  triangulating  a  given  set  of  points. 
The  Delaunay  algorithm  represents  a  unique  construction  of  this  type.  It  also  has  the  desirable 
property  of  minimizing  the  aspect  ratios  of  the  triangular  cells.  Further  details  on  Delaunay  tri¬ 
angulation  can  be  found  in  [4,11].  The  resulting  mesh  is  then  post-processed  by  a  smoothing 
filter  which  slightly  repositions  the  mesh  points  to  ensure  a  distribution  of  smoothly  varying 
elements.  The  new  position  of  a  mesh  point  is  calculated  as 
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xT  =  *?"  +  -£(**-*) 

n  t=i 

with  a  similar  expression  for  the  y-coordinate.  co  is  a  relaxation  factor,  and  the  sum  is  over  all 
edges  meeting  at  point  i. 

5.1.  Mesh  Point  Distribution  by  Conformal  Mapping 

Conformal  mapping  techniques  are  used  to  generate  global  mesh  point  distributions  about 
the  multi-element  airfoil  configurations  studied  in  this  work.  Each  airfoil  element  of  the 
configuration  can  be  mapped  to  a  circle  by  the  application  of  a  Karman-Trefftz  transformation, 
followed  by  a  shearing  transformation.  The  resulting  circle  is  fitted  with  a  polar  mesh.  Upon 
mapping  the  circle  back  to  the  airfoil,  a  body-fitted  regular  quadrilateral  O-mesh  is  obtained. 
When  this  procedure  is  repeated  for  each  element  of  the  configuration,  a  series  of  overlapping 
O-meshes  is  obtained.  If  the  mesh  cells  are  ignored,  and  the  points  which  overlap  with  neigh¬ 
boring  airfoil  elements  are  omitted,  a  distribution  of  points  in  the  flow-field  is  obtained.  These 
points  are  then  used  as  a  basis  for  the  triangulation  procedure.  Global  refinement  is  achieved 
by  prescribing  twice  as  many  points  in  the  radial  and  circumferential  directions  of  each  mapped 
airfoil  element,  and  remeshing  the  new  point  distribution. 

52.  Mesh  Point  Distribution  by  Adaptive  Techniques 

Adaptive  mesh  techniques  offer  the  advantage  of  obtaining  higher  solution  accuracy  with 
fewer  mesh  points.  This  is  achieved  by  concentrating  the  mesh  points  only  in  areas  where  large 
discretization  errors  are  observed.  In  principle,  any  type  of  adaptive  meshing  technique  may 
be  employed,  since  the  present  multigrid  algorithm  is  decoupled  from  the  mesh  generation  pro¬ 
cedure.  Presently,  a  simple  refinement  technique  is  employed,  based  on  the  extensive  investi¬ 
gation  of  Danenhoffer  [12],  for  structured  quadrilateral  grids.  The  undivided  first  difference  of 
density  is  used  as  a  refinement  criterion,  since  the  density  varies  with  all  important  flow 
features.  For  each  edge  of  the  mesh,  that  is,  any  line  segment  of  the  mesh  which  joins  two 
nodes,  the  difference  of  the  density  between  the  two  end  nodes  is  examined.  If  this  difference 
is  larger  than  some  fraction  (i.e.  taken  as  0.5  in  this  work)  of  the  RMS  average  difference 
over  all  mesh  edges,  a  new  mesh  point  is  created  midway  along  that  edge.  For  mesh  edges 
approximating  a  curved  boundary,  such  as  the  airfoil  surfaces,  the  new  mesh  point  will  not 
coincide  with  the  boundary,  and  must  be  projected  back  onto  the  airfoil  surface.  Once  all  new 
mesh  points  have  been  determined,  they  are  combined  with  old  mesh  points  and  retriangulated. 
Splitting  along  edges  in  such  a  manner,  rather  than  subdividing  entire  triangular  cells,  avoids 
the  introduction  of  unnecessary  mesh  points,  and  offers  the  possibility  of  directional 
refinement. 

In  both  of  the  above  cases,  the  new  refined  mesh  point  distribution  may  contain  points 
from  the  previous  coarser  mesh.  However,  the  connectivity  of  these  meshes  is  determined  by 
the  retriangulation  procedure,  and  in  general,  the  sequence  of  meshes  will  be  unnested.  Further¬ 
more,  the  mesh  points  are  displaced  in  the  post-processing  smoothing  operation,  and  thus,  none 
of  the  refined  mesh  points  will  coincide  with  the  previous  coarse  mesh  points.  Thus,  for  both 
cases,  the  coarse  and  fine  meshes  of  the  sequence  are  independent  from  each  other. 

6.  RESULTS 

Results  are  presented  for  a  two-element  airfoil  system  in  transonic  flow.  The  basic 
configuration,  which  consists  of  a  main  airfoil  fitted  with  a  leading  edge  slat,  has  been  the  sub- 
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ject  of  an  extensive  study  to  determine  the  effectiveness  of  slats  as  a  transonic  maneuvering  aid 
for  fighter  configurations  [13].  The  Mach  number  is  0.7,  and  the  angle  of  attack  is  2.8°.  Fig¬ 
ure  4  depicts  the  sequence  of  four  globally  refined  meshes  used  in  the  full  multigrid  algorithm. 
The  finest  mesh  contains  a  total  of  5629  points.  The  computed  pressure  distribution  on  the 


Sequence  of  Meshes  Generated  by  Global  Refinement  for  the 
Unstructured  Multigrid  Algorithm 
Mesh  1  :  114  Nodes 
Mesh  2:  382  Nodes 
Mesh  3  :  1458  Nodes 
Mesh  4  :  5629  Nodes 


(Partial  View  of  Meshes  Only) 
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finest  mesh  of  this  sequence  is  shown  in  Figure  5.  Large  suction  peaks  are  evident  near  the 
leading  edges  of  both  airfoil  elements.  A  very  small  supersonic  zone  terminating  with  a  shock 
is  visible  on  the  lower  surface  of  the  slat,  near  its  leading  edge.  A  strong  shock  at  about  mid- 
chord  on  the  main  airfoil  is  also  observed.  The  convergence  rate  of  the  multigrid  algorithm  on 
this  sequence  of  meshes  is  shown  in  Figure  6,  as  measured  by  the  RMS  average  of  the  density 
residuals  in  the  flow-field.  On  the  finest  grid,  an  average  residual  reduction  of  0.897  per  mul¬ 
tigrid  cycle  is  observed,  reducing  the  residuals  by  5  orders  of  magnitude  in  100  cycles.  The 
convergence  rate  is  roughly  the  same  on  all  meshes  of  the  sequence,  thus  validating  the 
effectiveness  of  the  multigrid  algorithm. 

The  same  case  has  been  computed  with  adaptively  generated  meshes.  A  sequence  of  six 
meshes  is  employed.  The  first  two  meshes  are  identical  to  the  two  coarsest  meshes  of  the  glo- 


Figure  5 

Surface  Pressure  Distribution  on  the  Main  Airfoil  and  the  Leading  Edge  Slat 
Calculated  on  the  Finest  Mesh  of  the  Globally  Refined  Mesh  Sequence. 
Mach  *  0.7  Incidence  *  2.8° 
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bally  refined  sequence  in  Figure  4.  The  next  four  meshes  of  the  sequence,  depicted  in  Fig¬ 
ure  7,  were  obtained  by  successive  adaptive  refinements.  The  finest  mesh  of  the  sequence  con¬ 
tains  4697  points,  roughly  16%  less  than  the  finest  mesh  of  Figure  4.  Figure  8  shows  the  sur¬ 
face  pressure  distribution  computed  on  this  mesh.  The  accuracy  of  this  solution  is  clearly 
superior  to  that  of  Figure  5.  The  definition  of  the  shock  on  the  main  airfoil  as  well  as  the 


Figure  6 

Convergence  Rate  as  Measured  by  the  RMS  Average  of  the  Density  Residuals 
throughout  the  Flow-field  versus  the  Number  of  Multigrid  Cycles  for  the 
Globally  Refined  Mesh  Sequence  Beginning  mi  the  Second  Mesh  of  the  Sequence. 
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Figure  7 

Sequence  of  4  Adaptively  Generated  Meshes  Used  in  the  Multigrid 
Algorithm  in  Conjunction  with  the  2  First  Meshes  of  Figure  4 
3  Mesh  3  :  790  Nodes 

Mesh  4  :  1631  Nodes 
Mesh  5  :  3107  Nodes 
Mesh  6  :  4697  Nodes 
(Partial  View  of  Meshes  Only) 

; 
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Surface  Pressure  Distribution  on  the  Main  Airfoil  and  the  Leading  Edge  Slat 
Calculated  on  the  Finest  Mesh  of  the  Adaptively  Refined  Mesh  Sequence. 
Mach  =  0.7  Incidence  =  2.8° 


shock  on  the  slat  is  much  sharper  than  in  the  former  case,  due  to  the  higher  density  of  mesh 
points  in  these  regions.  The  suction  peaks  on  both  airfoils,  as  well  as  the  "hook"  on  the  lower 
surface  near  the  leading  edge  of  the  main  airfoil  are  resolved  in  much  better  detail.  The  lower 
surface  of  the  main  airfoil  contains  fewer  points  than  in  the  previous  case.  However,  the  accu¬ 
racy  of  the  solution  is  not  affected,  since  no  large  flow  gradients  are  present  in  this  region.  On 
the  other  hand,  the  resolution  at  the  trailing  edge  of  the  main  airfoil  is  somewhat  lower  than 
desired.  Figure  9  shows  the  convergence  rate  for  this  sequence  of  meshes.  An  average  residual 
reduction  of  0.895  per  multigrid  cycle  is  achieved  on  the  finest  mesh,  reducing  the  residuals  by 
5  orders  of  magnitude  in  100  cycles.  This  rate  is  also  seen  to  be  roughly  equivalent  on  all 
meshes  of  the  sequence.  The  adaptive  mesh  technique  is  thus  seen  to  produce  more  accurate 
solutions  for  less  wotk.  A  solution  with  equivalent  accuracy  to  the  globally  refined  mesh  solu¬ 
tion  depicted  in  Figure  4,  can  be  obtained  with  roughly  1/3  the  number  of  mesh  points. 
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Figure  9 

Convergence  Rate  as  Measured  by  the  RMS  Average  of  the  Density  Residuals 
throughout  the  Flow-field  versus  the  Number  of  Multigrid  Cycles  for  the 
Adaptively  Refined  Mesh  Sequence  Beginning  on  the  Second  Mesh  of  the  Sequence. 


For  both  cases,  the  multigrid  convergence  rates  are  comparable  to  convergence  rates 
obtained  with  a  structured  multigrid  Euler  solver  [14],  A  better  assessment  of  the  real 
efficiency  of  the  present  multigrid  algorithm  is  given  in  Figures  10  and  1 1,  where  the  multigrid 
convergence  rates  for  both  of  the  above  cases  are  plotted  versus  the  number  of  work  units.  A 
work  unit  is  defined  as  the  amount  of  CPU  time  required  to  perform  a  single  grid  cycle  on  the 
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WORK  UNITS 

Figure  10 

Convergence  Rate  as  Measured  by  the  Number  of  Work  Units  for  the  Full  Multigrid  Algorithm 
on  the  Globally  Refined  Mesh  Sequence  Beginning  on  the  Second  Mesh  of  the  Sequence 
Compared  with  the  Convergence  Rate  on  the  Single  Finest  Grid  of  the  Sequence. 


finest  mesh  of  the  multigrid  sequence.  For  comparison,  the  appropriate  single  grid  conver¬ 
gence  rates  are  also  plotted  on  the  same  figures.  The  multigrid  convergence  histories  include 
the  time  spent  calculating  transfer  addresses  and  coefficients,  performing  inter-grid  transfers  of 
variables,  and  time  stepping  on  coarse  grids.  They  do  not,  however,  include  the  mesh  genera¬ 
tion  time.  In  all  cases,  the  time  spent  calculating  the  transfer  addresses  and  coefficients 


428 


The  Euler  Equations  on  Unstructured  and  Adaptive  Meshes 


WORK  UNITS 

Figure  11 

Convergence  Rate  as  Measured  by  the  Number  of  Work  Units  for  the  Full  Multigrid  Algorithm 
on  the  Adaptively  Refined  Mesh  Sequence  Beginning  on  the  Second  Mesh  of  the  Sequence 
Compared  with  the  Convergence  Rate  on  the  Single  Finest  Grid  of  the  Sequence. 


between  any  two  meshes  is  of  the  order  of  2  to  3  multigrid  cycles  on  the  newly  generated 
mesh.  In  any  particular  multigrid  saw-tooth  cycle,  the  total  fraction  of  time  spent  transferring 
variables  back  and  forth  between  meshes  is  about  2%.  The  solution  efficiency  of  the  adaptive 
mesh  sequence  was  found  to  be  less  than  optimal,  since  time  stepping  occurs  on  all  mesh  lev- 
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els,  even  in  regions  such  as  the  far-field,  where  no  mesh  refinement  takes  place.  In  fact,  little 
or  no  deterioration  in  the  multigrid  convergence  rate  was  observed  for  this  case,  when  time 
stepping  on  every  second  mesh  of  the  sequence  in  the  saw-tooth  cycle  was  omitted. 


7.  CONCLUSION 

The  idea  of  uncoupling  the  multigrid  algorithm  from  the  grid  generation  procedure  is  an 
effective  means  for  accelerating  the  convergence  to  a  steady  state  of  the  Euler  equations  on 
arbitrary  grids.  The  adaptive  meshing  technique  produces  significant  increases  in  efficiency 
over  global  mesh  refinement.  Further  work  is  required  to  determine  more  effective  adaptive  cri¬ 
teria,  and  to  optimize  the  amount  of  work  spent  on  each  grid  of  adaptively  generated  multigrid 
sequences. 
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A  numerical  method  for  solving  the  isenthalpic  form  of  the  governing  equations 
for  compressible  inviscid  flows  on  general  curvilinear  coordinate  systems  is 
described.  The  method  is  based  on  the  concept  of  flux  vector  splitting  in  its 
implicit  form  and  is  tested  on  several  demanding  configurations.  Time 
marching  to  steady  state  is  accelerated  by  the  implementation  of  Pull 
Approximation  Storage  (FAS)  multigrid.  High-quality,  second-order  accurate, 
steady-state  results  are  obtained  for  various  test  cases. 


1 .  INTRODUCTION 

Many  algorithms  for  iteratively  calculating  inviscid  compressible  flows  have 
been  developed.  Perhaps  the  oldest  is  the  explicit  MacCormack  [1]  scheme 
dating  back  to  1969.  Next  came  the  implicit  (three-factor  ADI)  method  (2), (3) 
using  central  differences  for  the  spatial  flux  derivatives.  The  explicit, 
multistage  Runge-Kutta  method  with  central  differences  for  the  spatial 
derivatives  (4]  and  multigrid  acceleration  (5]  followed. 

In  references  16] - [81 ,  a  full  formulation  of  the  Euler  equation  was  used.  In 
references  [9]  and  [10],  an  isenthalpic  formulation  was  used  which  reduced  the 
three-dimensional  problem  to  a  set  of  four  partial-differential  equations. 
The  energy  equation  was  replaced  by  an  algebraic  expression. 

This  paper  is  U.S.  government  work,  cannot  be 
copyrighted,  and  lies  in  the  public  domain. 


432 


Isenthalpic  Form  of  the  Compressible  Flow  Equations 


The  present  effort  continues  the  work  by  von  Lavante  and  uses  the  isenthalpic 
assumption  in  two  dimensions.  With  the  governing  equations  reduced  to  three 
partial-differential  equations,  it  is  necessary  to  only  solve  3x3  matrices  in 
the  block  tridiagonal  system  of  equations.  This  requires  about  one-half  as 
much  work  as  solving  the  4x4  block  tridiagonal  systems  if  the  isenthalpic 
assumption  is  not  made  (9  versus  16  elements). 

In  the  present  method,  with  the  isenthalpic  assumption,  the  conservation  of 
total  enthalpy  is  assured,  a  priori.  Jameson  uses  an  enthalpy  damping 
acceleration  technique  [51  which  introduces  an  artificial  enthalpy  which  is 
eliminated  at  convergence  and  total  enthalpy  is  then  conserved.  Many 
investigators  have  applied  the  enthalpy  damping  acceleration  technique 
introduced  by  Jameson  [51,  have  used  this  technique  to  predict  very  complex 
two-  and  three-dimensional  configurations,  and  have  reported  results  that  were 
in  good  agreement  with  experimental  data.  Solutions  of  the  isenthalpic  Euler 
equations  should  compare  well  with  those  from  the  Jameson  methods. 

The  isenthalpic  assumption  is  not  without  its  drawbacks.  First  of  all,  it  is 
limited  to  steady-state  calculations  since  the  substantial  derivatives  of  the 
total  enthalpy  and  pressure  are  related.  Second,  for  viscous  calculations, 
the  maximum  frees tream  Mach  number  is  limited  to  transonic  and  moderate 
supersonic  values  where  heating  effects  are  not  important.  Higher  freestream 
Mach  numbers  may  be  considered  if  the  Prandtl  number  is  unity  (Pr  =  1)  and  the 
wall  is  treated  as  adiabatic.  Finally,  viscous  results  can  only  be  considered 
approximate,  since  in  real  flows  the  total  enthalpy  changes  within  the 
boundary  layer.  Notwithstanding  these  limitations,  the  present  scheme  works 
well  and  produces  good  quality  results  in  cases  where  the  flow  is  steady. 

The  work  required  to  obtain  a  steady-state  solution  is  further  reduced  by  the 
use  of  multigrid  acceleration.  The  Full  Approximation  Storage  multigrid 
scheme  is  used.  Various  V-cycle  strategies,  as  well  as  W-cycles  and  full- 
multigrid,  have  been  studied. 

2.  THE  EQUATIONS  OP  MOTION 

There  is  a  large  class  of  problems  where  only  steady-state  solutions  are  of 
interest.  For  inviscid  flows,  the  assumption  of  steady-state  flow  reduces  the 
energy  equation  to  the  simple  statement  that  in  the  absence  of  heat  sources 
and  sinks,  the  total  enthalpy  will  remain  constant.  The  energy  equation  is 
therefore  replaced  by  a  simple  algebraic  equation,  reducing  the  number  of 
partial-differential  equations  to  be  solved  by  one. 
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The  two-dimensional  compressible  Euler  equations  for  general,  body-fitted 
coordinates  written  in  nondimensional  strong  conservation  law  form  are: 


3y  3f  3G 
3t  +  35  +  3h  = 
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Using  the  definition  of  the  speed  of  sound  at  stagnation  conditions,  c  =  YRT  , 

o  o 

the  nondimensional  total  enthalpy  is 


c 

o 


resulting  in  the  following  form  of  the  equation  of  state 


+  v2)} 


(3) 


where  p  is  density,  u  and  v  are  the  Cartesian  velocity  components,  and  p  is 
static  pressure.  All  variables  are  nondimensionalized  by  the  stagnation  values 
(for  details,  see  ref.  [10]):  the  primes  denoting  nondimensional  quantities 
in  reference  [101  are  dropped  for  convenience.  Tfte  metric  coefficients  of  the 
transformation  of  coordinates  are  defined  as 


J  xP 


-  J  'it 


J  x. 
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where  J  is  the  Jacobian  of  the  transformation 


J  =  1/<x£yn  "  Xr>V 

and  and  are  the  contravariant  velocities 

u,  =  u5  +  v£  ,  U  =  un  +  v 

i  x  y  n  x 


3.  DEVELOPMENT  OF  ALGORITHM 

An  implicit  Euler  single  step  temporal  scheme  was  selected  for  advancing  the 
solution  of  equation  (t)  in  time.  After  linearization  in  time  using  Taylor 
series  expansions  of  the  flux  vectors  F  and  G  and  approximate  factorization  of 
the  implicit  operator  (details  are  given  in  ref.  [3)),  the  basic  algorithm  has 
the  form 

jl  +  At  3^  Anj  j^I  +  At  3nBn]  AQn  =  -At  (S^f"  +  S^g")  =  r”  (7) 

where  A  and  B  are  the  Jacobian  matrices 


A 


B 


3G 

3  Q 


(8) 


The  Jacobian  matrices  A  and  B  are  given  in  detail  in  reference  [10].  The 
spatial  discretization  of  equation  (7)  can  be  carried  out  in  many  different 
ways.  In  the  present  method,  the  flux  vector  splitting  approach  applied  to 
cell  centered  finite  volume  formulation  was  selected,  mainly  due  to  its 
superior  ability  to  capture  relatively  strong  shocks  within  at  most  two 
zones.  It  can  also  be  shown  that  its  truncation  error  provides  the  minimum 
necessary  damping  to  limit  spurious  oscillations  in  the  weak  solutions  to  the 
Euler  equations.  Based  on  previous  experience  reported  in  reference  [10]  as 
well  as  results  presented  in  reference  [8],  it  was  decided  to  use  the  flux 
vector  splitting  introduced  by  van  Leer  [12]  coupled  with  the  so-called  MOSCL 
type  differencing.  The  van  Leer  splitting  was  selected  because  the  split  flux 
vectors  are  smooth  and  have  smooth  first  derivatives  with  respect  to  the  Mach 
number,  so  that  their  eigenvalues  are  also  smooth. 

The  inviscid  flux  vectors  F  and  G  each  have  a  complete  set  of  three  real 
eigenvalues  and  can  therefore  be  split  into  two  vectors,  one  with  non-negative 
eigenvectors  and  one  with  non-positive  eigenvectors.  Following  reference  [12], 
these  are 
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p  =  F+  +  p"  ,  G  =  G+  +  G"  (9) 

where,  for  example,  F+  =  (f*,  F*,  FpT. 

These  split  fluxes  have  to  be  transformed  into  general  coordinates  £  and  n, 
which  is  accomplished  by  simply  rotating  the  local  coordinates  at  a  given  point 
in  the  flow  field  to  a  direction  parallel  with  one  of  the  covariant  vectors 
1  and  n.  This  procedure  is  described  in  some  detail  in  reference  [8];  the 
resulting  transformation  is 


The  new  dependent  variable  vector  Q  is  obtained  from  Q  by  replacing 
the  Cartesian  velocity  components  u  and  v  by  the  physical  velocity  components 
u  and  v  in  the  covariant  direction  X.  These  are,  respectively. 


Knowing  the  eigenvalues  of  the  3plit  fluxes,  it  is  now  obvious  that  in  the 
spatial  differences  in  equation  (7)  F+  and  G+  have  to  be  backward  differenced 
and  P"  and  G"  have  to  be  forward  differenced.  This  is  accomplished  by  the 
application  of  the  MUSCL  type  differencing,  described  in  more  detail  in 
references  [8], [10].  Here,  instead  of  using  the  traditional  backward  or 
forward  finite  differences  operating  on  F+,  G+,  F~,  and  G",  the  dependent 

variables  Q,  which  are  better  differentiable  than  the  flux  vectors,  are 
extrapolated  to  the  cell  faces  in  positive  or  negative  direction,  depending  on 
the  sign  of  the  eigenvalues.  The  right-hand  side  of  equation  (7)  becomes 
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Gi,j-1/2  +  Gi,j+1/2 


"  Gi, j-1/2^ 


(11) 


where,  for  example. 


Fi+1/2,j  =  F  (Qi+1/2,j)  ;  Fi+1/2,j  "  F  ^  QT+1/2, j  ^ 


Gi, j+1/2  =  °  ^Ci,j+1/2^ 


G-  =  G~f  O'  1 

i,  j+1/2  '•Wi,  j+1/2J 


and 


Q, 


'i+1/2,  j  =  Qi+1,j  "  ks  2  (Ci+2,j  "  Ci+1,j)* 


Ci+1/2,j  =  Qi,j  +  ks  2  (Qi,j  “  Ci-1,j)' 


etc.,  with  similar  expressions  in  the  j-direction.  The  parameter  kg  switches 
between  first-order  formulation  (kg  =  0)  and  second-order  formulation  (kg  =  1). 

The  present  formulation,  when  applied  to  transonic  and  low  supersonic  flows, 
does  not  require  the  use  of  flux  limiters  for  essentially  oscillation  free 
shocks.  This  was  noticed  by  Anderson,  Thomas,  and  van  Leer  [81  and  von  Lavante 
and  Haertl  [101  and  was  explained  in  more  detail  by  van  Leer  [12].  The 
favorable  behavior  of  the  present  formulation  is  due  to  the  fact  that  at 

transonic  speeds,  the  backward  extrapolated  flux  that  is  being  extrapolated 
from  downstream  of  the  shock  is  much  smaller  than  the  forward  extrapolated 
flux.  Despite  the  above  linear  extrapolation,  no  or  very  small  overshoots  are 
encountered. 

The  implicit  left-hand  side  of  equation  (7)  undergoes  similar  modifications  as 
the  right-hand  side.  The  Jacobian  matrices  A  and  B  are  replaced  by  the 
corresponding  Jacobians  of  the  split  flux  vectors,  yielding  the  following  form 
of  the  left-hand  side  of  equation  (7). 

^1  +  3^  A+  +  3*  A~J  ^  I  +  3^  B+  +  3*  B_ j  AQn  =  Rn  (12) 

I  3p^"  __  3  ^  gG+  —  3G  b  f 

where  A  =  Jq~'  A  *  3^~ '  B  *  3Q~'  B  =  gg- '  and  an<*  ^5  are  ^*-rst“or^er 

backward  and  forward  differences,  respectively.  A  standard  block  tridiagonal 
solver  was  used  to  solve  the  system  of  algebraic  equations  given  by  equa¬ 
tion  (5).  It  should  be  noted  that  the  two-factor,  block  tridiagonal  form  of 

the  resulting  algorithm,  given  by  equations  (12)  and  (11),  is  relatively  easy 
to  vectorize. 
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Two  types  of  boundary  conditions  were  needed  for  this  configuration.  On  the 
far-field  boundary,  the  characteristic  treatment  with  correction  for  lifting 
body  was  employed  as  introduced  by  Thomas  and  Salas  [13].  Here,  the  entropy, 
tangential  velocity,  and  the  proper  Rlemann  invariant  were  extrapolated  in  the 
direction  of  normal  velocity;  and  the  remaining  Riemann  invariant  from  the 
opposite  direction  was  either  specified  or  extrapolated.  The  tangential 
velocity  was  corrected  every  iteration  to  account  for  the  point  vortex 
representation  of  the  airfoil.  At  the  solid  body,  static  pressure  was 
extrapolated  from  the  interior  points  to  the  body  along  the  body-normal 
direction  using  the  equation  of  state.  The  flux  vector  G  normal  to  the  solid 
surface  was  not  split,  and  thus  required  only  pressure. 

An  improvement  on  the  convergence  rate  to  steady-state  conditions  was  achieved 
by  the  use  of  local  time  steps.  In  this  procedure,  the  time  step  used  in  each 
of  the  cells  was  determined  from  the  maximum  local  eigenvalue  after  each 
iteration. 

4.  MULTIGRID 

The  Full  Approximation  Storage  (FAS)  multigrid  scheme  was  used  since  the  above 
set  of  equations  is  nonlinear. 

The  restriction  operator  for  the  flow  quantities  P,  pu,  and  Pv  involved  the 

volume  weighted  average  of  the  values  at  the  midcells  of  the  four  fine  grid 

cells  contained  in  a  coarse  grid  cell.  The  restriction  operator  for  the 

residuals  involved  a  simple  summation  of  the  residuals  over  the  four  fine  grid 
cells  composing  the  coarse  grid  cell. 

The  restriction  operations  are  performed  for  all  interior  points  of  the  flow 
field.  At  the  outer  boundaries,  only  the  values  of  the  functions  are 
restricted,  with  no  residual  restriction.  These  values  are  frozen  to  the  fine 
grid  values  and  are  not  updated  on  the  coarse  grids  since  a  lift-correction 
scheme  is  used  to  set  the  outer -boundary  values  on  the  fine  grid.  The  lift- 
correction  scheme  was  found  to  be  less  accurate  on  the  coarse  grids;  and,  if 
applied  on  the  coarser  grids,  it  tended  to  slow  the  convergence.  At  the 

airfoil  surface,  the  same  type  boundary  condition  was  used  for  all  the  grids. 
At  the  wake  cut,  flow  values  at  ghost  cells  were  set  equal  to  the  flow  values 
from  the  coincident  cells  across  the  wake  on  all  the  grids. 

The  prolongation  operation  used  in  the  current  work  is  a  bilinear 

interpolation,  in  the  computational  space,  of  the  corrections  at  the  four 

coarse  grid  cells  either  containing  or  adjacent  to  the  fine  grid  midcell. 

Volume  weighting  is  not  used. 
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A  flexible  logic  structure  is  built  into  the  multigrid  code  to  allow  the  study 
of  various  aultigrid  strategies.  Fixed  V-  or  W-cycles  are  allowed  and  are 
controlled  by  input.  The  number  of  iterations  on  each  grid  between 
restrictions  and  prolongations  is  controllable  by  input  and  either  first- 
order  (ks  =  0)  or  second-order  (k8  ■  1 )  approximations  can  be  used,  in  any 
combination,  on  each  of  the  grids.  Local  time  stepping  is  employed  on  each 
grid  with  a  reference  CFL  number  input  for  each  grid. 

A  precise  accounting  for  work  units  is  based  on  CPU  time  where  the  CPU  time 
required  to  perform  one  fine  grid  iteration  is  considered  one  work  unit,  the 
work  required  to  perform  all  other  grid  iterations  and  grid  transfers  is  then 
defined  as  the  CPU  time  required  to  perform  the  operation  divided  by  the  CPU 
time  to  perform  the  fine  grid  iteration.  This  method  of  accounting  is  used  in 
the  present  work. 

5.  RESULTS 

The  present  Euler  method  was  tested  on  several  standard  NACA  0012  airfoil  test 
cases  at  various  Mach  numbers  and  angles  of  attack.  All  calculations  were 
performed  on  the  NASA  Numerical  Aerodynamic  Simulator  (NAS),  a  64-bit  word 
CRAY  2. 

An  elliptic  grid  generation  technique  was  used  to  generate  a  209^33  cell  C-grid 
which  was  used  for  all  calculations  presented  here.  The  grid  was  constructed 
to  be  nearly  orthogonal  at  the  airfoil  surface.  The  upper  and  lower  airfoil 
surfaces  had  72  cells  each  and  each  side  of  the  wake  had  32  cells.  The  normal 
spacing  at  the  airfoil  leading  edge  was  0.0025  times  the  chord.  The  chordwise 
spacing  at  the  leading  edge  was  0.002  times  the  chord.  The  outer  boundary  was 
10  chords  from  the  airfoil. 


Table  1.  Comparison  of  Results  for  NACA  0012  with  Ref.  [14] 


Ci 

Present 

CD 

Ref. 

C* 

[14] 

CD 

0s  8,  0s 0 

-.3702* 

io14 

0.86*10-12 

— 

0.82* 1 0~2 

0.8,  1.25 

0.3633 

0.0233 

0.3618 

0.0236 

0.63,2.0 

0.3306 

0.0008 

— 

— 
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Since  the  flow  results  are  independent  of  iteration  strategy  (assuming  proper 
convergence),  the  solutions  for  the  three  test  cases  are  presented  first;  then 
the  multigrid  acceleration  is  discussed.  The  three  test  cases  are  at  the 

following  flow  conditions:  freestream  Mach  number  of  0.8  (M*  3  0.8)  at  an 

angle  of  attack  of  0°  (a  »  oP),  M»  =  0.8  at  a  »  1.25°,  and  M*  »  0.63  at 
a  =  2°.  A  summary  of  the  lift  and  drag  results  is  given  in  table  1. 

(a)  »  0.8,  a  «  0° . -  This  supercritical  case  is  used  to  test  the  ability  of 

the  numerical  method  to  preserve  the  symmetry  of  the  flow.  The  correct  lift 

coefficient  Cg  is  zero,  while  the  drag  coefficient  Cd  is  nonzero  due  to 

-8 

shocks.  The  present  method  predicted  the  Cg  to  be  1.41*10  ,  a  value 
acceptably  close  to  zero.  The  predicted  drag  was  Cd  •  0.0087,  which  is  in  very 
good  agreement  with  previous  results  (ref.  (14)).  Mach  number  and  pressure 
contours  and  sonic  lines  are  shown  in  figure  1.  The  shocks  were  captured 
within  at  most  two  zones  and  are  very  crisp.  The  Mach  contours  are  very 

smooth,  indicating  the  absence  of  spurious  oscillations.  At  these  conditions, 
all  runs  were  made  at  the  multigrid  optimum  CFL  number  of  10.5. 

(b)  Mp,  =  0.8,  a  =  1 .25°.-  This  supercritical  lifting  case  is  well  suited  for 

testing  the  performance  of  the  boundary  conditions,  since  the  lift  is  very 
sensitive  to  the  influence  of  the  boundary  conditions.  The  results  obtained 
from  the  present  method  were  Cg  =  0.3675  and  Cd  =  0.0237.  These  results  are 
again  in  good  agreement  with  data  published  by  many  investigators;  see,  for 

example,  references  (8) , [1 4] , [1 5] ;  the  range  of  best  results  given  in  these 

papers  was  Cg  =  0.3632-0.3661  and  Cd  =  0.0229-0.0230,  achieved  on  grids  that 

extended  up  to  96  chords  from  the  airfoil.  The  comparison  with  results 
published  by  Anderson  et  al.  (ref.  (8))  was  also  favorable;  they  reported 

Cg  =  0.363  and  Cd  =  0.0234.  The  corresponding  Mach  numbers  and  pressure 
contours  are  shown  in  figure  2.  The  shock  on  the  upper  surface  was  again  very 
well  captured.  All  runs  at  these  conditions  were  made  at  the  multigrid  optimum 
CFL  number  of  21 . 

(c)  M»  -  0.63,  a  »  2°.-  In  this  subcritical  case,  the  main  difficulty  is  the 
correct  prediction  of  drag.  Here,  with  the  absence  of  shocks,  the  Cd  should  be 
zero.  The  present  scheme  computed  Cg  =  0.3304  and  Cd  =  0.0007.  Both  values 
are  in  reasonable  agreement  with  results  reported  by  Anderson,  Thomas,  and  van 
Leer  [8],  given  as  Cg  =  0.332  and  Cd  =  0.0006.  The  Mach  number  and  pressure 
contours  can  be  seen  in  figure  3.  The  flow  is  subcritical  with  no  supersonic 
points.  All  calculations  for  this  flow  condition  used  optium  CFL  number  of 
14.0. 
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Table  2.  Comparison  of  Work  Required  to  Obtain  Cg  Within 


±1%  for  Various 

Multigrid  Strategies 

Strategy 

M  = 
00 

Cases 

0.8,  a  =  1.25°  M^ 

=  0.63,  a  =  2.0° 

Single  grid 

180 

169 

V, 

2-m  iterations 

90 

75 

V, 

FMG,  2  iterations 

41 

48 

V, 

FMG,  2-m  iterations 

46 

43 

w, 

FMG,  2  iterations 

30 

38 

w. 

FMG,  2-M  iterations 

58 

54 

Note: 

All  multigrid  calculations 

used  four  grids. 

The  successful  acceleration  of  the  iteration  technique  is  demonstrated  by  the 
following  calculations.  A  single-grid  calculation  was  performed  on  the 
Ha,  =  0.80,  a  =  1.25°  test  case  which  gave  an  asymptotic  spectral  radius  (p)  of 
0.994.  This  was  the  most  difficult  case  and  the  lift  was  predicted,  to  within 
1  percent  of  the  converged  value,  in  180  work  units.  This  calculation  was  then 
repeated  using  2-,  3-,  and  4-grid  muitigrid.  (See  fig.  4.)  Simple  V-cycles 
were  used  for  the  calculations.  Each  of  the  grids  in  the  4-grid  multigrid  can 
be  numbered  from  m  =  1  for  the  finest  grid  to  m  =  4  for  the  coarsest  grid. 
Using  this  numbering  scheme,  the  2-m  iteration  strategy  is  herein  defined  as 
performing  2  •  m  iterations  on  each  grid  in  the  multigrid  strategy.  The  2-grid 
multigrid  gave  good  acceleration  but  its  performance  was  modestly  improved  by 
the  3-  and  4-grid  solutions.  The  4-grid  strategy  gave  p  =  0.954  and  predicted 
the  lift  in  90  work  units.  (See  table  2  and  fig.  5.) 

In  the  Mb,  =  0.63,  a  =  2°  case,  better  performance  was  obtained.  (See 

fig.  6.)  The  single  grid  calculation  gave  p  =  0.983  and  obtained  the  lift  in 
169  work  units.  The  4-grid,  V-cycle,  2-m  iteration  strategy  gave  a  spectral 
radius  of  0.953  and  predicted  the  lift  in  only  75  work  units.  The  correspond¬ 
ing  spectral  radii  for  the  M,,  =  0.80,  a  =  0°  case  were  p  =  0.986  for  the 
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Figure  4.  Multigrid  acceleration  at  =  0.80  and  a  =  1.25°. 


single  grid  and  p  =  0.945  for  the  4-grid  calculation.  Notice  that  the  multi - 
grid  performance  is  nearly  the  same  for  the  two  lifting  cases  (p  =  0.953  for 
=  0.63  and  P  =  0.954  for  M*  =  0.80).  Also  notice  that  the  presence  of  lift 
slows  convergence  as  evidenced  by  the  convergence  rates  of  the  two  M»  =  0.8 
cases  (p  =  0.954  for  a  =  1.25°  and  p  =  0.945  for  a  =  0°). 

The  multigrid  calculations  were  further  improved  with  the  use  of  full  multigrid 
(FMG),  grid  refinement  in  conjunction  with  multigrid.  Although  the  asymptotic 
spectral  radius  remains  unchanged,  global  aspects  of  the  flow  field  are 
predicted  in  fewer  work  units.  A  4-grid  FMG  scheme  with  30  iterations  on  the 
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Figure  5.  Comparison  of  convergence  for  a  single  grid  and  a  4-grid 
multigrid  calculation  at  ^  =  0.80  and  a  =  1.25°. 


coarsest  grid  and  5  cycles  each  on  the  two  intermediate  grids  was  examined. 
The  2-m  iteration  strategy  was  used.  For  the  subcritical  M*  =  0.63,  a  =  2° 
case,  the  lift  was  predicted  in  43  work  units  with  FMG,  versus  75  for  the  V- 
cycle.  (See  table  2.)  In  the  supercritical  1^,,  =  0.8,  a  =  1,25°  case,  FMG  gave 
the  lift  in  46  work  units  versus  90  with  the  V-cycle.  The  work  performed  on 
the  coarse  grids  in  the  grid  refinement  is  included  in  these  totals. 

The  strategy  of  using  2-m  iterations  on  each  grid  in  the  V-cycle  is  an  attempt 
to  effectively  attack  the  low  frequency  components  of  the  error  by  doing  extra 
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Figure  6.  Multigrid  acceleration  at  =  0.63  and  a  =  2.0°. 


work  on  the  coarse  grids.  This  may  also  be  done  by  using  W-cycles.  If  two 
iterations  are  performed  on  each  of  the  grids  and  FMG  is  also  used,  the  lift 
for  the  Mo,  =  0.63,  a  =  2°  case  is  predicted  in  only  38  work  units.  For  the 
Ha  =  0.8,  oi  =  1.25  case,  the  lift  is  obtained  in  30  work  units.  (See 
fig.  7.)  This  performance  is  better  than  the  V-cycle  2-m  strategy.  A 
comparable  calculation  was  performed  by  Anderson  ref.  16  on  a  193x33  C-grid 
using  the  method  described  in  reference  8.  The  lift  was  predicted  (to  within  1 
percent)  in  33  work  units. 

First-order  differencing  was  tried  on  the  coarser  grids  while  applying  second- 
order  differencing  on  the  finest  grid.  First-order  differences  take  less  time 
to  compute  than  second-order  differences  so  a  potential  for  speed-up  is 
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Figure  7.  Convergence  history  for  FMG  W-cycle  at  M,*  =  0.80  and  a  =  1.25°. 

possible.  Unfortunately,  the  convergence  rate  is  slower  than  for  second-order 
differences  everywhere.  The  converged  solutions  were  the  same  with  both 
techniques. 

VI.  CONCLUSIONS 

Hie  isenthalpic  form  of  the  compressible  Euler  equations  was  solved  for  flow 
about  a  lifting  airfoil  near  and  at  transonic  speeds.  High-quality,  second- 
order-accurate  results  were  obtained  for  three  flow  conditions. 

Various  multigrid  strategies  were  employed  to  accelerate  convergence.  Fixed  V- 
and  W-cycles  were  studied.  The  W-cycle  was  found  to  predict  the  correct  lift 
in  fewer  work  units  than  the  V-cycle. 
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ABSTRACT 


For  non-shared  memory  systems,  static  multigrid  methods  are  usually  paral¬ 
lelized  by  domain  decomposition  and  fixed  assignment  of  subdomains  to 
processors.  In  this  paper  we  consider  multigrid  methods  where,  following  a 
sequence  of  global  grids,  local  efinements  occur  in  the  neighborhood  of  a 
point.  Even  for  methods  with  local  refinements,  a  static  mapping  is  useful  if 
the  location  of  all  local  grids  is  known  from  the  beginning.  If  their  location 
is  dynamically  determined,  this  kind  of  mapping  must  be  replaced  by  a  suf¬ 
ficiently  frequent  rebalancing.  A  rebalancing  is  suggested  after  processing  a 
sequence  of  successive  grids.  The  optimal  sequence  length  is  determined  for  a 
wide  class  of  computer  archi tectures  having  an  orthogonal  grid  of  connection 
lines  as  connection  structure.  The  achievable  speedup  is  discussed  by  complex¬ 
ity  considerations. 


1.  INTRODUCTION 


A  usual  method  of  parallelizing  PDE  solvers  is  domain  splitting.  The  domain 
of  the  differential  equation  is  splitted  into  subdomains  which  are  assigned  to 
the  processors  of  the  system.  A  processor  must  then  carry  out  all  computations 
for  the  grid  points  in  its  subdomain.  In  a  parallel  algorithm  of  this  kind, 
computational  steps,  e.g.,  relaxation,  alternate  with  data  exchange  steps.  The 
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data  exchange  before  a  computational  step  is  necessary  for  a  correct  compu¬ 
tation  of  intermediate  results  even  at  the  boundary  of  a  subdomain.  A  general 
analysis  of  this  topic  is  given  in  [5]. 

Multigrid  methods  have  already  been  described,  e.g.,  in  [15],  and  they  are 
therefore  assumed  as  known.  For  large  problems,  methods  with  local  refinements 
shall  concentrate  the  computational  work  on  those  locations  of  the  fine  grids 
where  required  by  the  desired  accuracy.  Methods  of  this  kind  have  been  known 
since  some  years  (cf.  [3]).  More  recent  results  are  published,  e.g.,  in  [12], 
[7]  and  [1].  [1]  contains  also  an  overview  of  other  papers  relevant  to  the 
subject.  The  aim  of  the  present  paper  is  to  study  which  part  of  problem  para¬ 
llelism  can  be  exploited  on  a  special  class  of  computer  architectures,  we  dis¬ 
cuss  a  simple  problem  type  where,  beginning  with  a  certain  grid  level,  further 
refinements  occur  only  in  the  neighborhood  of  a  point.  Let  all  the  following 
grids  have  an  equal  number  of  points.  Such  examples  have  been  considered  in 
[12]  and  [14], 

We  consider  message  passing  systems  consisting  of  many  independent 
processors.  The  messages  can  be  sent  via  a  b-dimensional  orthogonal  system  of 
connection  lines. 

A  multigrid  method  with  local  refinements  apparently  requires  a  different 
partitioning  and  mapping  for  different  parts  of  the  algorithm.  We  study  some 
techniques  here  to  avoid  such  rebalancing  whenever  possible. 

If  the  location  of  all  local  grids  is  a  priori  known,  then  there  is  a  sim¬ 
ple  and  nearly  optimal  strategy  of  partitioning  and  mapping.  Our  decomposition 
differs  from  usual  decompositions  because  we  do  not  use  axis  parallel  inter¬ 
sections  alone.  For  every  grid,  we  decompose  that  region  of  the  domain  which 
is  not  refined  in  the  next  level.  The  parts  are  then  mapped  onto  the  system 
such  that  the  work  load  is  balanced  with  respect  to  every  considered  region. 
Furthermore,  the  neighborhood  of  subdomains  can  be  preserved  after  mapping.  In 
this  way,  we  obtain  results  similar  to  those  obtained  without  local  refine¬ 
ments. 

If  the  location  of  local  grids  is  determined  dynamically,  then  a  fixed 
decomposition  and  mapping  of  the  whole  domain  for  all  grids  leads  sometimes  to 
an  unbalanced  workload.  In  such  cases,  we  must  perform  occasionally  a 
redistribution  of  the  problem  in  the  sequence  of  grids.  In  non-shared  memory 
systems,  a  redistribution  between  two  subsequent  grids  requires  a  data  trans¬ 
fer  of  the  size  of  the  coarser  grid. 

Hoppe  and  Miihlenbein  (cf.  [8])  use  a  natural  partitioning  and  mapping  for 
each  local  grid.  This  requires  a  redistribution  of  subdomains  between  the 
processing  of  any  two  subsequent  local  grids.  In  case  of  our  system  structure. 
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the  resulting  upper  bound  for  the  efficiency  is  independent  of  problem  size 
and  tends  to  0  for  increasing  system  size  (cf.  section  7).  Therefore,  this 
method  is  useful  only  for  small  problems  and  systems. 

To  minimize  the  data  transfer,  the  sequence  of  all  grids  within  a  V-cycle 
is  divided  into  appropriate  subsequences.  For  every  subsequence,  the  standard 
scattered  decomposition  developed  by  Fox  and  Otto  is  used  (cf.  [4]).  A 
redistribution  of  the  workload  is  required  only  at  the  beginning  of  a  sub¬ 
sequence.  The  mapping  is  illustrated  in  fig.  4. 

It  is  the  aim  of  this  paper  to  obtain  assertions  on  the  possible  speedup 
achieved  on  parallel  computers.  Let  our  system  consist  of  P  processors.  For 
practical  reasons,  efficient  algorithms  showing  a  speedup  of  0(P)  are  of  spe¬ 
cial  interest.  The  well-known  0-,  Q-  and  0-notation  is  used  for  upper  and 
lower  bounds  or  for  the  exact  order-  of  magnitude  of  a  function  (cf.  [9]).  Let 
an  algorithm  be  defined  for  all  relevant  problem  sizes  N  and  adequate  system 
sizes  P.  Let  A(P)  denote  the  time  cost  required  for  solving  a  specific  problem 
on  P  processors,  measured  in  a  unit  cost  measure.  As  usual,  we  call 
S (P) =A (1 )/A (P)  speedup  and  E(P)=S(P)/P  efficiency.  We  call  an  algorithm  effi¬ 
cient  if  the  number  of  processors  P=F (N)  to  be  used  in  case  of  problem  size  N 
is  determined  as  a  function  of  N  and  if  E(P)=£Kl)  holds  for  N  — »  ~. 

The  sections  2  and  3  give  a  precise  description  of  computer  structure  and 
model  problem.  The  case  of  a  priori  known  local  grids  is  briefly  considered  in 
section  4.  The  decomposition  in  case  of  adaptive  local  refinement  is  defined 
in  section  5.  We  investigate  the  case  of  many  local  grids  in  section  7  and 
that  of  few  local  grids  in  section  8.  The  break-even  point  of  our  method  is 
discussed  in  section  9.  The  present  paper  is  in  some  parts  identical  with  a 
preprint  of  GMD  [13],  This  preprint  discusses  some  aspects  in  more  detail. 

2.  DEFINITION  OF  THE  COMPUTER  STRUCTURE 

We  consider  non-shared  memory  systems  consisting  of  many  independent 
processors  and  buses.  Two  buses  or  a  bus  and  a  processor  can  be  connected  by  a 
connection  unit.  Let  the  processors  be  P(i1,...,ib)  with  i j=0, . . . ,2P-1  and 
j=1,...,b  .  Let  v  be  an  integer,  0<v<p.  The  elements  of  every  subset 
{P(i 1 , . . . , ib)  :  k2v  <  ij  <  (k+1)2v,  iL  fixed  for  l* j  and  1=1, ...,b)  with  a 

fixed  j  and  a  fixed  k  (ke{0,1 . 2p-v-1))  are  connected  by  a  single  bus.  All 

processors  that  coincide  in  b-1  coordinates  are  located  on  a  connection  line. 
For  simplicity,  we  assume  that  every  connection  line  forms  a  ring.  The  system 
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FIG.  Is  A  connection  line  of  8  processors 

realized  by  4  buses  with  2  processors  each  (p=3,  u=2,  v=1). 

has  p=2bp  processors.  Each  connection  line  is  a  chain  of  buses  coupled  via 
connection  units.  Each  bus  connects  2V  processors.  2U  buses  form  a  connection 
line  of  the  length  2p=2u+'/  (given  by  the  number  of  processors).  Fig.  1  shows 
an  example  of  this  structure.  Let  the  time  cost  for  the  transport  of  a  packet 
with  x  data  elements  via  a  bus  be  a  linear  function  of  x.  Only  one  processor 
of  a  bus  can  act  as  a  sender  in  a  time  unit.  We  assume  that  all  buses  can  work 
independently  of  each  other.  Let  the  performance  of  the  connection  unit  be  so 
high  that  its  share  of  the  time  cost  is  not  relevant  to  an  algorithm.  The  size 
of  memories  should  always  be  large  enough.  Therefore,  the  implementation  of 
parallel  algorithms  is  restricted  only  by  the  number  of  processors,  the  number 
of  buses  and  their  structure. 

This  model  allows  to  represent,  e.g.,  with  v=0,  b-dimensiona l  nearest- 
neighbor  systems  (NN-systems,  cf.  [10])  and,  with  u=0,  orthogonal  bus-coupled 
systems  (cf.  [11]).  Our  results  are  also  applicable  to  hierarchical  systems 
like  SUPRENUM  if  the  highest  system  level  is  the  bottleneck  (cf.  [11]). 

3.  THE  MODEL  PROBLEM 

We  confine  ourselves  to  the  Poisson  equation  with  Dirichlet  boundary  condi¬ 
tions  on  a  cubic  domain  R=(x=  (xi , . . .  ,xd) .- 0<xj<1 ,  j  =  l . d}  in  d>l  dimensions. 

Let  the  discrete  problem  be  defined  on  a  sequence  of  standard  grids 
Gi  (1<i<n+m).  The  coarsest  grid  G1  consists  only  of  the  midpoint  of  R.  The 
grid  Gi  (i>1)  results  from  Gi_1  by  bisection  of  the  edges  of  the  meshes.  Let 
the  grids  be  global  for  i<n.  Therefore,  they  have  (2i-1)d  points.  Let  the  lo¬ 
cal  grids  Gi  (n<i<n+m)  be  a  cubic  part  of  the  corresponding  complete  grid  with 
2dn  points.  Without  loss  in  generality,  let  them  all  be  in  the  neighborhood  of 
a  specific  vertex  of  R  (cf.  fig.  2).  We  consider  V-cycling  and  FMG.  Let  the 
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level  4 


level  3 


level  n=2 


level  1 


FIG.  2:  Two  local  grids  in  the  upper  left  corner  and  the  global  grids. 


relaxation  method  be  of  no  negative  effect  on  domain  splitting  (e.g.,  Oacobi 
or  odd-even  relaxation).  With  respect  to  the  multigrid  method,  we  use  only  the 
following  facts.  The  computational  work  per  grid  point  ,  grid  level  and 
V-cycle  is  restricted.  The  weighting  of  the  grid  level  i  for  V-cycling  is  1 
and  for  FMG  it  is  n+m-i+1.  We  sketch  the  proofs  only  for  V-cycling,  but  the 
resulting  remarks  apply  also  to  FMG.  This  model  problem  does  not  reflect  the 
conditions  occurring  in  large  applications,  but  allows  a  simple  analysis  and 
short  description  of  the  behavior  of  parallel  systems. 


4.  A  PRIORI  KNOWN  LOCATION  OF  LOCAL  GRIDS 

If  the  location  of  the  local  grids  is  known  from  the  beginning  of  a 
V-cycle,  we  can  find  an  appropriate  static  decomposition  of  the  domain.  For 
every  grid  G*  (n<i<n+m)  the  part  of  the  domain  R  containing  points  of  Gj  but 
none  of  G1  +  1  is  decomposed  and  mapped  onto  the  whole  system.  The  decomposition 
and  mapping  must  lead  to  a  good  balancing  of  the  computational  and  transport 
work  for  this  region  and  all  grids.  Especially  the  neighborhood  of  subdomains 
should  be  preserved  in  the  system.  Moreover,  adjacent  subdomains  disjoined  by 
the  inner  boundary  of  the  local  grids  should  be  mapped  onto  neighboring 
processors.  Fig.  3  shows  such  a  mapping  o.  our  model  problem  to  a 
2-dimensional  system  of  4  processors.  This  decomposition  and  mapping  may 
easily  be  generalized  to  greater  systems,  higher  dimensions  and  other  lo¬ 
cations  of  the  refinement  point. 

In  comparison  with  a  standard  mapping  of  a  problem  of  size  N  with  n  global 
grids  (cf.  [10]  and  [11]),  we  see  the  following  changes.  Let  us  decompose  the 
given  problem  into  subproblems.  These  subproblems  are  defined  by  restriction 
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to  a  region  of  the  given  domain  showing  grid  points  exactly  up  to  a  certain 
grid  level.  In  case  of  m  local  grids,  the  problem  is  composed  of  m+1  sub¬ 
problems  showing  the  same  behavior  as  a  static  problem  with  n  global  grids. 
The  sizes  of  boundary  data  exchange  and  of  the  computational  work  differ  with 
respect  to  the  compared  algorithms  only  by  a  factor  0(m+1).  Therefore,  the 
achievable  speedup  for  efficient  algorithms  is  of  identical  size  in  both 
cases.  We  can  immediately  extract  the  results  from  the  papers  mentioned  above. 

REMARK  4.1:  In  case  of  a  priori  known  location  of  local  grids,  the  achievable 
speedup  for  efficient  parallel  multigrid  algorithms  with  n  global 
and  m  local  grids  is: 

S(P)  =  Q(Nb/<b+d>)  for  bus  systems 

and 

S (P)  =  Q(Nb/'max<b'M  .d>)  for  NN-Systems  • 

For  the  above  result,  all  problem  dimensions  have  to  be  splitted.  When  only  b 
coordinates  are  subdivided,  we  obtain  S(P)=Q(Nb/(2d))  for  bus  systems  and 
S(P)=Q(N/log  N)b/d  for  NN-systems  if  d=b+1.  Though  the  speedup  in  remark  4.1 
is  optimal  for  a  problem  without  local  refinements,  this  must  not  hold  in  our 
case. 

In  practice,  a  restriction  to  axis  parallel  intersections  is  desired.  Such 
a  restriction  may  reduce  the  achievable  speedup  considerably  (cf.  [2]). 
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-  boundary  of  grids,  -  additional  boundary  of  subdomains 

The  mapping  onto  the  processors  fk  (i,k=0,1)  is  marked  for  all  regions. 

FIG.  3:  Mapping  of  a  2-dimensional  domain  with  known  local  grids  in  the  upper 
left  corner  onto  a  2-dimensional  system  with  4  processors  (p=1). 
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-  boundary  of  grids 

—  —  —  boundary  of  subdomains  for  coarser  grids 

-  boundary  of  subdomains  for  finer  grids 

FIG.  4:  Mapping  of  a  problem  with  3  local  grids  and  redistribution-free  sub¬ 
sequences  of  length  2=2  onto  a  4-processor  system  (cf.  fig.  3). 


5.  DECOMPOSITION  AND  MAPPING  FOR  DYNAMICALLY  DETERMINED  LOCAL  GRIDS 

We  assume  that  the  dimension  of  the  problem  exceeds  the  dimension  of  the 
system  (d>b) .  l>0  is  an  integer  parameter  which  is  typical  of  partitioning  and 
mapping  a  subsequence  of  all  grids.  The  domain  R  is  then  decomposed  into  the 
subdomains 

R|.(ii . ib>  =  {x=(x1,...,xd):  0sxj<2-p” *•  if  ij=0  or  ' 

ij2"P-l<Xj<(i j+i)2-P-1  if  ij>1 
for  j=1, . . . ,b; 

0<xj<1  for  j=b+1,...,d  } 

where  the  indices  i je{0, . . . ,2P+l-1} .  The  mapping  is  then  given  by 

Rl (i . . ib)  — *  P( i i ' . V>  if  =  mod(2p)  for  j=1 . b  . 

Due  to  the  restriction  to  this  simple  type  of  mapping,  d>b  must  hold  and  it 
is  generally  impossible  to  obtain  optimal  results  in  case  of  differences  in 


dimension. 
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6.  BASIC  CONSIDERATIONS  ON  COMPLEXITY  OF  COMPUTATIONAL  ANO  TRANSPORT  WORK 


The  number  of  operations  per  grid  point  and  V-cycle  is  restricted.  N=2dn  is 
the  number  of  the  points  of  the  finest  global  grid  and  of  each  of  the  m  local 
grids.  Therefore,  the  computational  work  for  a  V-cycle  in  a  single  processor 
needs  the  time  cost  0(N(m+1)).  For  efficiency, 

A(P)  *  0(N(m+1)/P)  =  0( (m+1) 2dn-bp)  (1) 
is  required  for  the  total  time  cost  A  CP)  of  an  algorithm  using  P  processors. 

Let  m=yn  with  a  constant  y.  We  investigate  two  cases  1<y  and  0<y<1.  In  case 
of  m>n,  the  achievable  speedup  is  determined  by  the  local  grids  only.  The  lo¬ 
cal  grids  n+1 . n+m  are  subdivided  into  subsequences.  It  is  sufficient  to 

analyze  such  a  subsequence.  The  grids  1 . n  are  added  to  the  adjacent  sub¬ 
sequence.  These  grids  are  considered  by  the  analysis  of  the  case  m<n. 

Let  a  subsequence  of  local  grids  be  given  between  the  levels  y  and  n+l 
(n<y<n+l).  Let  a  partition  of  the  domain  and  a  mapping  of  the  subdomains  to 
the  processors  be  defined  as  in  section  5  with  this  parameter  l.  The  distribu¬ 
tion  of  the  corresponding  subdomains  is  carried  out  at  the  level  y,  i.e., 
between  the  computations  of  level  y-1  and  those  of  level  y. 

Only  two  subsequences  are  considered  in  the  case  m<n.  The  first  subsequence 
contains  the  grid  levels  1  to  y-1.  The  mapping  parameter  of  this  subsequence 
is  1=0.  As  before,  a  redistribution  is  made  at  the  level  y.  The  second  sub¬ 
sequence  contains  the  levels  y  to  n+m  and  its  mapping  parameter  is  l=m. 

Our  domain  splitting  produces  brick-shaped  subdomains  showing  two  different 
edge  lengths.  The  long  edges  correspond  to  the  unsplitted  edges  of  the  origi¬ 
nal  domain.  The  short  edges  are  obtained  from  the  original  edges  by  p+l  sub¬ 
sequent  bisections.  Only  non-empty  subdomains  must  be  considered  at  the  level 
i,  i.e.,  they  must  contain  at  least  one  point  of  the  current  grid. 

For  the  grid  level  i  (1<i<n+m),  we  obtain  the  following  sizes  of  the  essen¬ 


tial  components  of  the  total  cost.  The  d-b  long  edges  possess 

C^Ci.l)  =  2"'in<1’n)  +  0(1)  (2) 

grid  points.  The  maximum  length  of  the  short  edges  is: 

C2(i,l)  =  r2mi"<‘-P-l'n>i  +  0(1)  .  (3) 

The  maximum  number  of  non-empty  subdomains  per  processor  is: 

C3(i,l)  =  r2b-">in(i,l..n.n+l-i>-|o(1)  .  (4) 

A  path  length  is  measured  by  the  number  of  buses  to  be  passed  by  a  message. 
The  path  length  for  boundary  data  exchange  is: 

C4(i,l)  -  rsu+e-i-j  +  o(i)  if  i>i  and  C4( i.U  =  0  if  i<l  .  (5) 


The  maximum  number  of  processors  per  bus  that  have  to  process  a  non-empty  sub- 
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domain  is  at  the  i-th  level: 

C5(i,l)  =  r2min<w’i-u-l)l  +  0(1)  if  isn+l+u  .  (6) 

The  cost  for  the  computational  work  or  for  the  boundary  data  exchange  of  a 

grid  can  be  obtained  as  a  product  with  factors  like  (2)  -  (6). 

Finally,  determine  the  domain  data  exchange  on  a  system  with  P  processors 

in  case  of  problem  redistribution  at  the  level  y.  Without  loss  in  generality 

we  analyse  only  the  transfer  from  fine  to  coarse  grids.  To  this  purpose,  a 
restricted  share  of  all  points  must  be  moved,  i.e.,  a  share  whose  size  cor¬ 
responds  to  the  total  number  of  grid  points:  0(2d  min(y.n))  ^t  the  level  y-1, 
there  are  more  active  processors  having  non-empty  subdomains  than  at  the  level 
y.  For  the  redistribution  of  data,  we  use  all  buses  which  are  connected  with  a 
processor  being  active  at  the  level  y-1.  Our  result  cannot  be  improved  essen¬ 
tially  by  using  more  than  these  buses.  We  can  activate  all  these  buses  only  if 
we  spread  the  data  packets  of  a  processor  first  to  an  appropriate  bundle  of 
connection  lines.  In  the  following  transport  phase,  the  b  directions  of  con¬ 
nection  lines  are  used  one  after  the  other.  All  data  packets  are  transported 
as  far  as  possible  in  the  current  direction.  For  the  transport  in  any  direc¬ 
tion,  a  data  set  of  the  size  0<2d  ">in(y,n>)  must  pass  the  buses  belonging  to 
an  intersection  hyperface  of  the  system.  This  requires  a  time  cost  of: 

0(r2d-min<y,n>-<b-(>fnin<y,p>-})  _  (7) 

Furthermore,  at  least  one  data  element  has  to  be  transported  on  a  path  having 
the  half  length  of  a  connection  line  (in  number  of  buses).  The  minimum  time 
cost  for  passing  such  a  path  is: 

0(2U)  .  (8) 
During  the  spreading  of  data  packets,  the  system  bottleneck  is  the  set  of 
buses  which  are  connected  to  a  processor  being  active  at  the  level  y.  Ac¬ 
cording  to  (6),  the  number  of  processors  is  0(2b  min<p.max<o.y-l ) >)  _  This 
number  must  be  reduced  by  the  number  of  active  processors  belonging  to  a 
single  bus  0(f2min<v’,,ia><<o’y-u-t)  >"]) .  Therefore,  the  cost  for  the  distribution 
of  the  transport  packets  of  the  processors  has  the  size: 

Q(2d ‘ min<y , n )  -  b • min<p >max ( o >y- l ) )  +  mine v >max( 0 »y-u- L ) ) )  (9) 

7.  OPTIMAL  DISTANCE  BETWEEN  REDISTRIBUTIONS  IN  CASE  OF  MANY  LOCAL  GRIDS 

Let  us  now  consider  a  sequence  of  local  grids  between  the  levels  y>p+l>n 
and  n+l  of  the  length  z=n+l-y+1.  I  is  again  the  mapping  parameter  (cf.  sec¬ 
tion  5).  Due  to  (2)  -  (4),  the  computational  work  required  for  the  grids  is: 
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AC(P)  =  0(nilC1(i,l)d“bCz(i,l)bC3(i,l>) 
i -y 

Ac (P)  =  6(z2dn_bP)  .  (10) 

For  the  boundary  data  exchange,  we  obtain  from  (2)  -  (6): 

n+i  .  .5 

Ab(P)  -  0<  I  C^i,  l)d-bCz(i,  L)b-'1  II  Cj(i,l)) 
i=y  J=3 

Ab(P)  =  0(2dn-bp-n+p  +  z-fV)  .  (11) 

As  the  domain  data  exchange  for  a  redistribution,  we  obtain  from  (7)  -  (9): 

Ad(P)  =  Q(2<ln-<b-l  )p+2u+2dn-bp+v> 

Ad(P)  =  0(2dr,~bP  +  P)  .  (12) 

(10)  -  (12)  together  deliver  the  total  cost: 

A  (P)  =  0(2dn-bP(z+2P+2P+z+v-n))  (13) 

The  last  term  in  (13)  is  irrelevant  because  of  y>p+l  and  v<p.  For  efficiency, 
p<log  z  +0(1)  is  required.  The  average  work  A(P)/z  per  grid  is  minimal,  if 
z=n-p+0(1).  i.e.,  we  must  select  a  maximum  z. 

Let  us  extend  the  consideration  to  a  longer  subsequence  that  contains  the 

levels  p+l  and  n+l.  Using  again  (2)  -  (4),  we  obtain  between  the  levels  p+l-x 

and  n+l+x: 

AC(P)  =  0( (n-p)2dr,~bp+2dr,"bP+bx)  .  (14) 

For  efficiency  in  the  considered  subsequence  of  length  z=n-p+2x,  it  is 
necessary: 

dn-bp+bx  <  dn-bp+log(n-p+2x)+0(1) , 
i.e.,  x  <  1/b  log(n-p)  +0(D  • 

From  (7)  it  follows  A(P) =Q(2dn"bp+p) .  Hence  for  efficiency  it  must  hold: 
2dn-bp+p  _  0(z2dn-hp), 

i.e.,  p  <  log(n-p+2x)  +  0(1)  • 

Because  of  these  bounds  for  x  and  p  we  obtain 

x  <  1/b  log  n  +  0(1)  and  p  <  log  n  +  0(1)  •  (15) 

Summarizing  the  above,  we  obtain  the  main  result: 

REMARK  7.1:  The  maximum  system  size  P=2bp  for  efficient  algorithms  is  deter¬ 
mined  by  p  =  log  n  +  0(1) •  The  speedup  achievable  by  an  efficient 
algorithm  in  a  subsequence  of  local  grids  is: 

S (P)  =  fi(logbN)  . 

This  result  can  be  achieved  with  redistribution-free  subsequences 
of  the  length  z=n-p+0(D. 
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The  optimal  length  of  redistribution-free  subsequences  of  grids  is  found  to 
be  n+0(log  n).  The  term  Odog  n)  cannot  be  given  more  precisely  for  the  whole 
class  of  architectures  because  it  varies  for  different  members  of  that  class. 
We  do  not  point  out  such  considerations  here  because  that  term  does  not  in¬ 
fluence  the  order  of  the  achievable  speedup. 

The  result  S(P)=0(logbN)  means  that  in  systems  of  arbitrary  size  an  effi¬ 
cient  computation  is  possible  if  the  problem  size  is  large  enough.  P=0dogbN) 
is  sufficient,  i.e.  the  problem  size  must  be  very  much  larger  than  the  system 
size. 

For  comparison  let  us  consider  a  method  with  redistribution  of  subdomains 
between  any  two  local  grids.  Let  m=yn  and  y>0.  In  case  of  optimal  balancing, 
the  computational  work  for  a  local  grid  is  0(2dn-bp).  The  redistribution 
requires  the  cost  Q(2dn-<b-'t)P)  because  of  (7).  The  efficiency  is  then  bounded 
by  the  ratio  of  these  expressions. 

REMARK  7.2:  In  case  of  redistribution  of  subdomains  for  every  local  grid,  the 
efficiency  is  bounded  by 
E (P)  =  0(2”P)  . 

Efficient  algorithms  require  a  constant  system  size. 


8.  ANALYSIS  IN  CASE  OF  FEW  LOCAL  GRIDS 


In  case  of  few  local  grids,  we  set  m=yn  with  0<y<1 .  A  redistribution  of  the 
domain  is  made  at  the  level  y.  The  mapping  for  the  finer  grids  is  done  with 
the  parameter  l=m  and  for  the  coarser  grids  with  1=0. 

Using  (2)  -  (4),  we  obtain  for  the  computational  work: 

n+m 

AC(P>  =  0(  I  C.,  (i ,  l)d-bC2(i ,  l)bC3(i ,  l))  (16) 

i  =  l 

The  size  of  the  boundary  data  exchange  follows  from  (2)  -  (6): 
n+m  5 

At,  (P)  =  0(  l  C1(i,l)d-bC:!(i,l)b  n  C  j  (1 ,  l) )  (17) 

i=i  J=3 

Because  of  (7)  -  (9),  the  redistribution  of  the  domain  at  level  y  requires: 

(p)  _  g^j-gdy- <b-l  )min  (y  >  p  )  ■]  +  2U  +  2dy-b  *  max(  0  >y-m)+maxC  0  .y-m-u  > )  (18) 

We  then  obtain  for  the  total  cost: 


(19) 
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A(P)  =  Ac(P)+Ad(P)+Ab(P)  . 

We  want  to  continue  the  discussion  for  some  special  cases  only. 

CASE  It  Let  the  system  be  an  orthogonal  bus  system  (cf.  [11]),  i.e.,  v=p.  An 
optimal  result  can  be  achieved  for  y=1  and  p<n-m.  Neglecting  all  terms  of  (16) 
to  (19)  that  are  obviously  not  leading,  we  obtain: 

A(P)  =  0( (m+1)2dn_bp  +  2(d-b+'np+dm  +  2<d-‘nn-(b'2)p+m)  (20) 

For  an  efficient  algorithm,  the  first  term  of  (20)  must  be  the  leading  term. 
Let  the  first  and  the  j-th  term  of  (20)  (j=2  or  3)  have  the  same  size  if 
p=p.,j.  It  holds 

Piz  s  (n-m)d/<d+1)  +  1/(d+1)  log(m+1)  +  0(1)  , 

Pi3  =  (n-m)/2  +  1/2  log(m+1)  +  0(1)  • 

The  maximum  p  for  efficient  algorithms  is  p=min(p12,p13) .  Since  p13<p12  for 
large  n,  we  obtain  the  result: 

REMARK  8.1:  In  bus  systems  with  v=p,  the  system  size  p=2bp  is  bounded  in  case 
of  efficient  algorithms  by 

p  =  n(1-y)/2  +  1/2  log(m+1)  +  0(1)  • 

A  redistribution  is  not  necessary.  The  achievable  speedup  is 
S(P)  =  Q(N1-ff (m+1)d)b/(2d>  . 

CASE  2:  Let  the  system  be  an  NN-system,  i.e.,  u=p  and  v=0. 

CASE  2.1:  Let  d=b.  An  optimal  result  can  be  achieved  if  y<m+p<n.  Using 
(16)  -  (19) ,  we  obtain: 

A(P)  =  0(  (m+1)2d<n-p)  +  2dy_<d~'’  >min(y,p>  +  2P  +  2cd+1  >m+p-y)  (21) 

For  y>p,  the  second  term  of  (21)  equals  2dy_dp+p.  The  following  choice  for  y 
is  appropriate  to  minimize  A(P): 

dy-dp+p  =  (d+1)m+p-y  +  0(4)  , 
i.e.,  y  =  m+pd/(d+1)  +  0(4)  • 

Inserting  this  value  in  (21),  we  obtain: 

A(P)  =  0((m+1)2dCn_p)  +  2p  +  2dn'+py<d-M>)  . 

Now  we  have  to  minimize  A(P)  as  a  function  of  p.  Let  the  first  and  the  j-th 
term  of  this  expression  have  the  same  size  if  p=p^j. 
p12  =  (dn  +  log(m+1))/(d+1)  +  0(1) 
p13  =  (d(l-y)n  +  log(m+1)) (d+1)/(d(d+1)+1)  +  0(1)  • 

In  both  cases  (p=Pi2  and  p=p13),  y>p  holds  for  sufficiently  large  n  iff 
ff>d/(d+1)2.  For  those  values  of  y  it  holds  p13<p12.  We  get  the  minimum  of  A(P) 
if  p=min(p12,p13) .  Therefore,  the  best  choice  is  p=p13. 

For  y<p,  the  second  term  of  (21)  equals  2y.  In  case  of  (d+1)m<y<p  the 
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second  and  the  fourth  terms  are  dominated  by  2p.  A(P)  is  then  minimal  if 
p=p12.  (d+1)m<p  is  fulfilled  if  y<d/(d+1)2.  The  result  is: 

REMARK  8.2:  In  NN-systems  (v=0)  and  in  case  of  identical  dimension  (d=b) ,  the 
system  size  for  efficient  algorithms  is  bounded  by  p=2dp  and: 

(dn  +  log(m+1))/(d+1)  +  0(1)  if  ySd/(d+1)2 

(d(l-y)n  +  log(m+1))  (d+1)/(d(d+1)+l;  +0(1)  if  y>d/(d+l)2  . 

An  appropriate  level  y  for  redistribution  is  (d+1)m+0(D^y^P+0(1) 
if  y<d/(d+1)2  and  otherwise  y=m+pd/(d+1)+0(D •  The  achievable 
speedup  is 

’  Q(N(m+1))d/<d+<,>  if  y<d/(d+1 ) 2 

S(P)  = 

.  (m+1)  )d<d‘M  )  +  i  >  if  1>)f>d/(d+1)2  . 

For  FMG  and  y=0  the  expression  m+1  has  to  be  replaced  by  1/n. 

Comparing  the  results  9.1  and  9.2  we  a  slight  superiority  of  NN- 

archi tecture.  For  small  y,  NN-systems  show  a  much  better  behavior  than  bus 
systems.  In  case  of  bus  systems,  the  maximum  speedup  is  decreasing  if  y  is  in¬ 
creasing.  For  NN-systems,  the  speedup  is  slightly  increasing  in  y  if 
y<d/(d+1)2.  Only  for  greater  values  of  y,  the  result  becomes  worse.  For  real 
systems,  however,  the  speedup  strongly  depends  on  the  performance  parameters 
of  the  different  system  components.  With  respect  to  different  technology  for 
transport  units  of  bus-  and  NN-systems,  we  cannot  compare  two  real  systems 
with  given  size  in  that  way.  But  the  comparison  shows  the  better  scalability 
of  NN-systems  in  a  homogeneous  technology. 

CASE  2.2:  Let  d=b+1.  To  achieve  an  optimal  result,  we  can  use  y=1  and  p<n-m. 
Neglecting  all  not  leading  terms  of  (19),  we  obtain: 

A (P)  =  0( (m+1)2dn-bp+p2<b+1 )m+p)  .  (22) 

A (P)  is  minimal  if  its  terms  are  of  identical  size.  We  find  the  result: 

REMARK  8.3:  Let  an  NN-system  (v=0)  be  given.  Let  the  system  dimension  exceed 
the  problem  dimension  by  1  (d=b+1).  For  efficient  algorithms,  the 
system  size  p=2bp  is  then  bounded  by 

n  -  1/d  log  n  +  0(1)  if  y=0 
P  = 

,n(1-y)  +  0(1)  if  0<y<1  . 
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A  redistribution  is  not  necessary.  The  achievable  speedup  is 
Q(N/log  N)b'd  if  5=0 

Q(N<i-;r>b/d)  if  0<a.<1  _ 


S(P)  = 


In  case  of  FMG  and  5=0,  the  result  has  to  be  replaced  by 
p  =  n  -  2/d  log  n  +  OH)  and  S(P)  =  Q(N/log2N)b/d  . 


CASE  2.3:  Let  d>b+1.  In  this  case,  we  find  an  optimal  result  for  y=1  and 
p>n-m.  We  obtain  from  (16)  -  (19): 

A (P)  =  0( (n-p+1)2dn_bp  +  2n<d-b-1 >+^b+1 >m+P)  _ 

A (P)  is  minimal  if  the  terms  are  equal  in  size.  With  respect  to  the  restric¬ 
tion  p<n,  this  leads  to  the  result: 


REMARK  8.4:  Let  an  NN-system  (v=0)  be  given.  Let  the  problem  dimension  d  ex¬ 
ceed  the  system  dimension  b  at  least  by  2  (d>b+1).  For  efficient 

algorithms  of  the  considered  kind,  the  system  size  p=2bp  is  then 
bounded  by 


P  = 


n  +  OH)  if  5=0 


[  n(1-ff)  +  1/(b+1)  log  n  +  OH)  if  0<y<1. 

A  redistribution  of  subdomains  is  not  necessary.  The 
speedup  is 


achievable 


S(P)  =  fi(P) 


Q(Nb/d)  if  ff=0 

Q(N<l-S->b/d.  logb'<b  +  1>N)  if  0<5<1  . 


9.  BREAK-EVEN  POINT  FOR  REDISTRIBUTION-FREE  SUBSEQUENCES  OF  GRIDS 

There  are  two  independent  reasons  why  optimal  redistribution-free  sub¬ 
sequences  of  grids  degenerate  to  length  1  in  case  of  small  problems.  Great 
start-up  times  for  messages  require  a  short  distance  between  redistributions. 
This  influence  depends  on  the  properties  of  the  real  system.  We  shall  not  dis¬ 
cuss  these  problems  here  in  detail.  Secondly,  the  domain  data  exchange  between 
subsequent  grids  exceeds  the  boundary  data  exchange  only  if  the  subdomains 
contain  enough  points. 

In  our  method,  a  subdomain  has  2<d-b>n+b(n-p>  grid  points  at  the  level  n+l 
with  the  mapping  parameter  l  (cf.  section  5).  At  the  level  n+l,  the  non-empty 
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subdomains  are  mapped  onto  the  system  like  an  isomorphism  of  neighborhoods. 
The  inner  boundary  faces  are  in  this  case  of  the  size  2<d-b)n+(b-1) (n-p> . 

Let  us  consider  V-n-cycling  only.  This  means  one  relaxation  step  per  grid 
in  every  part  of  the  V-cycle.  Therefore,  every  inner  boundary  face  must  be 
sent  about  three  times  (two  for  relaxation  and  one  for  defect  computing  and 
interpolation).  We  have  2b  inner  boundary  faces  and  we  can  use  b  buses  per 
processor.  Each  bus  is  working  for  2W  processors.  In  case  of  redistribution- 
free  subsequences  of  length  2,  the  boundary  data  exchange  is  increased  by  2b 
additional  inner  boundary  faces  per  processor  for  one  of  the  two  grids.  This 
accumulates  to 

3*2*2v-2(d-b)n','<b_'1  Mn_p)  .  (23) 

One  domain  data  exchange  is  saved  in  this  case.  Within  a  V-cycle,  the  domain 
data  exchange  must  be  carried  out  twice  per  grid.  Almost  all  points  (i.e., 
2dn)  are  involved  in  the  redistribution.  Only  the  coarse  points  must  be  trans¬ 
ported.  The  transfer  is  done  by  b2(b-'1  * <p"'1 *  connection  lines.  In  case  of  b=2 
dimensions,  for  example,  these  are  the  lines  that  leave  the  upper  left  quarter 
of  the  system.  Therefore,  we  save 

2i-d.2dn-<b-i)cp--n/b  _  (24) 

The  break-even  point  of  our  method  is  the  size  of  n  where  (24)  exceeds  (23). 

REMARK  9.1:  In  V11-cycling  the  problem  size  that  is  necessary  for 

redistribution-free  subsequences  of  length  >  2  is  bounded  by 
n>d+1+v  +  log  <3b)  -  b  . 


EXAMPLE:  In  case  of  2-dimensional  systems  (b=2) ,  we  obtain  the  lower  bounds: 
n  >  d+1+p  for  bus  systems  (v=p)  and  n  >  d+1  for  NN-systems  (v=0) . 


EXAMPLE:  Let  us  consider,  for  example,  the  SUPRENUM  architecture  (cf.  [6]). 
This  architecture  shows  a  hierarchical  connection  system  of  buses  arranged  in 
two  levels.  At  the  lower  level,  16  processors  are  connected  to  a  cluster  via  a 
powerful  parallel  bus.  At  the  upper  level,  22p  clusters  are  connected  like  a 

2- dimensional  bus-coupled  system.  Therefore,  a  SUPRENUM-system  can  be  included 
in  our  consideration  as  a  2-dimensional  bus-coupled  system  showing  clusters 
instead  of  processors  (cf.  [11]).  A  system  with  4x4=16  clusters  has  the 
parameter  p=2.  Our  method  can  be  of  advantage  if  n>d+3,  i.e.  n>7  in  case  of 

3- dimensional  problems. 
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10.  SUMMARY 

The  knowledge  about  the  location  of  the  local  grids  is  essential  for  paral¬ 
lelizing  multigrid  methods  with  local  refinements  on  sparsely  connected  non¬ 
shared  memory  systems.  If  the  location  is  a  priori  known,  then  a  partition  and 
mapping  exists  with  similar  speedup  as  for  problems  without  local  refinements. 
In  case  of  V-cycling,  identical  dimension  ld=b) ,  N=2dn  as  the  size  of  the 
finest  global  grid  and  2bp  as  system  size,  the  speedup  S(P)  =  Q(N1/2)  can  be 
achieved  for  bus  systems  and  S(P)  =  Q(Nd/(d+-1 5 )  for  NN-syste„is. 

The  situation  is  completely  different  if  the  location  of  local  grids  is 
dynamically  determined.  In  case  of  multigrid  methods  with  more  local  grids 
than  global  grids,  the  systems  considered  are  suitable  to  a  very  restricted 
extent  only.  For  an  efficient  algorithm,  P=0dogbN)  is  required,  i.e.,  the 
system  size  P  must  be  very  small  if  compared  with  problem  size  0(N(m+1)). 
Nearest-neighbor  systems  and  bus-coupled  systems  do  not  show  significant  dif¬ 
ferences  from  the  viewpoint  of  connection  structure.  Within  the  sequence  of 
local  grids,  a  redistribution  of  the  domain  should  be  made  every 
(n-p+const)-th  grid  level.  Within  such  a  redistribution-free  subsequence  of 
grids,  the  scattered  decomposition  can  be  used. 

If  there  are  only  a  few  local  grids,  the  differences  of  the  structure  types 
considered  are  significant.  With  decreasing  number  of  local  grids  m,  we 
observe  a  nearly  continuous  transition  of  the  speedup  achievable  on  the 
specific  structure  type  to  the  known  results  for  the  case  m=yn=0  (cf.  [10]  and 
[11]).  If  y  is  small  and  in  case  of  equal  dimension  (d=b) ,  the  achievable 
speedup  is 

S(P)  =  Q(N','2'(m+1)d) 1/2  for  bus  systems  if  yci 

and 

S (P)  =  Q(N(m+1) )d/<d+1 }  for  NN-systems  if  0<y<d/(d+1)2  . 

With  a  ’small’  m  (m<const.),  redistribution  may  be  completely  dropped.  In  this 
case  we  obtain  again  a  simitar  system  behavior  as  that  observed  for  multigrid 
methods  without  local  refinements. 

For  the  method  considered,  the  number  m  of  local  grids  has  been  assumed  as 
known.  If  this  number  is  unknown,  it  would  have  to  be  estimated  before 
processing  to  determine  the  levels  of  rebalancing.  The  efficiency  of  the 
method  would  then  depend  on  the  quality  of  this  estimate. 

The  question  for  the  optimality  of  the  method  must  be  left  open  here. 
Certainly,  in  case  of  jnequal  dimension  (d>b) ,  the  resutt  is  not  optimal  due 
to  the  mapping  used,  such  as  already  known  in  the  case  m=0  (cf.  [10],  [11]). 
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Analysis  of  a  Multigrid  Method 
for  the  Euler  Equations  of  Gas 
Dynamics  in  Two  Dimensions 


Wim  A.  Mulder* 

Department  of  Computer  Science 
Stanford  University 
Stanford,  CA  94305-2140 

The  multigrid  convergence  factors  of  several  relaxation  schemes  for  the  linearised  upwind- 
differenced  Euler  equations  are  estimated  by  two-level  local-mode  analysis.  Strong  align¬ 
ment,  the  flow  being  aligned  with  the  grid,  causes  the  failure  of  schemes  that  use  only  local 
data,  such  as  Point-Jacobi,  Red-Black,  and  Block-Jacobi  relaxation.  Damped  collective 
symmetric  Gauss-Seidel  relaxation  and  an  undamped  version  of  Gauss-Seidel  relaxation, 
with  sweeps  in  all  four  directions,  are  both  global  relaxation  schemes  and  can  overcome 
this  problem  in  the  case  of  pure  convection.  However,  they  still  fail  for  the  full  system  of 
equations.  This  is  confirmed  by  numerical  experiments  for  the  nonlinear  Euler  equations. 

1.  Introduction 

The  multigrid  method  is  an  efficient  numerical  technique  for  solving  elliptic  equations.  It 
provides  solutions  within  the  truncation  error  for  an  amount  of  work  proportional  to  the 
number  of  points  or  cells  in  the  computational  domain.  Moreover,  only  one  or  a  few  multigrid 
iterations  are  required  for  regular  elliptic  problems.  The  theory  for  these  kind  of  problems 
is  well  established.  For  details,  the  reader  is  referred  to  the  textbook  by  Hackbusch  [3]. 

Several  attempts  have  been  made  to  use  the  multigrid  technique  for  the  computation  of 
steady  solutions  to  hyperbolic  partial  differential  equations,  specifically  the  Euler  equations 
that  describe  the  flow  of  an  inviscid  compressible  gas.  Ni  [15]  was  the  first  to  obtain  a  signif¬ 
icant  acceleration  with  respect  to  a  single-grid  Lax-Wendroff  scheme  by  using  multiple  grids. 
He  employs  explicit  time-stepping  as  a  relaxation  scheme,  which  is  hardly  efficient.  Jameson 
[6]  uses  central  differencing,  a  four-stage  Runge-Kutta  time-stepping  scheme,  residual  aver¬ 
aging,  and  enthalpy  damping.  Multigrid  accelerates  his  scheme  significantly.  Jespersen  [7] 
adopts  a  different  approach,  that  is  closely  related  to  the  standard  multigrid  technique  for 
elliptic  equations.  Upwind  differencing  by  means  of  flux-vector  splitting  [19]  is  used  for  the 
spatial  discretisation  and  Symmetric  Gauss-Seidel  (SGS)  for  relaxation.  Both  the  Correction 
Scheme  (CS)  and  the  Full  Approximation  Storage  scheme  (FAS)  are  studied.  In  the  first,  a 
global  linearisation  of  the  residual  is  computed,  and  the  multigrid  technique  is  applied  to  the 
linear  system.  In  the  latter,  the  nonlinear  equations  are  used  directly  during  the  multigrid 
cycle. 

In  early  1983,  without  being  aware  of  the  work  just  mentioned,  I  implemented  a  multigrid 
method  after  reading  a  paper  by  Brandt  [2],  and  found  grid-independent  convergence  factors 
for  a  transonic  test  problem  with  a  shock  [11].  A  flux-vector-splitting  version  by  van  Leer 
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Project  at  Stanford  under  the  Office  of  Naval  Research  Contract  N00014-82-K-0335. 

*  Present  address:  Department  of  Mathematics,  405  Hilgard  Avenue,  University  of  California  at 
Los  Angeles,  Los  Angeles,  CA  90024-1555. 
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[22]  is  used  for  the  upwind  differencing  of  the  isenthalpic  Euler  equations.  This  approximate 
Riemann-solver  is  continuously  differentiable,  a  property  shown  to  be  desirable  in  [10].  Grid 
transfer  is  based  on  the  finite-volume  residual  operator,  resulting  in  Galerkin  coarsening. 
Symmetric  Gauss-Seidel  is  chosen  as  the  relaxation  scheme,  based  on  earlier  work  in  [23].  The 
CS  scheme  is  used  to  solve  the  linear  system  arising  from  a  Switched  Evolution/Relaxation 
(SER)  method.  The  latter  can  be  viewed  as  a  global  Newton  method.  It  is  derived  from 
a  "backward  Euler”  implicit  time-discretisation,  with  a  "time-step”  inverse  proportional 
to  some  norm  of  the  residual.  Thus,  if  the  solution  is  far  away  from  the  steady  state,  the 
residual  is  large,  and  a  more  or  less  time-accurate  integration  is  carried  out.  Once  the  solution 
approaches  the  steady  state,  the  scheme  switches  to  Newton’s  method.  The  inclusion  of  the 
finite  "time-step”  is  necessary  in  the  case  of  a  singular  residual  to  avoid  divergence.  With 
this  method  and  for  the  specific  transonic  test  problem,  grid-independent  convergence  factors 
are  found  both  for  a  first-order-  and  second-order-accurate  spatial  discretisation.  For  the 
latter,  a  version  of  the  Defect  Correction  Method  [3:Eq.(14.3.1)]  is  used  with  a  second-order- 
accurate  residual  and  a  linear  system  based  on  a  first-order  discretisation.  Second-order 
accuracy  is  obtained  by  van  Leer’s  technique  [20,21].  The  nonlinear  generalisation  of  this 
method  is  sketched  at  the  end  of  [11],  but  no  experimental  results  are  presented. 

In  hindsight,  this  method  turns  out  to  be  very  similar  to  Jespersen’s.  The  main  difference 
is  the  finite-volume  approach  leading  to  volume-averaging  for  restriction  and  zero-order  in¬ 
terpolation  for  prolongation,  rather  than  the  nodal  point  approach  of  Jespersen  that  requires 
full  weighting  for  restriction  and  bilinear  interpolation  for  prolongation. 

Following  the  work  in  [7]  and  [11],  several  authors  have  experimented  with  the  multi¬ 
grid  method  for  the  upwind-differenced  Euler  equations,  using  either  the  Correction  or  the 
FAS  scheme.  Hemker  and  Spekreijse  [4]  incorporate  first-order  upwind  differencing,  Galerkin 
coarsening,  and  nonlinear  SGS  relaxation  in  a  FAS  scheme.  Osher’s  scheme  [16]  rather  than 
van  Leer’s  flux-vector  splitting  (FVS)  is  used  for  the  upwind  differencing.  This  scheme  is 
continuously  differentiable,  just  as  FVS,  but  more  accurate  (at  a  higher  cost).  The  higher  ac¬ 
curacy  allows  for  overspecification  at  the  boundaries,  in  contrast  to  FVS  where  characteristic 
boundary  conditions  are  required.  Apart  from  this  and  the  nonlinear  implementation,  the 
fundamental  difference  between  their  and  my  approach  is  the  omission  of  the  "time-step". 
This  will  cause  their  method  to  diverge  in  case  of  a  locally  singular  residual,  whereas  the 
insertion  of  a  "time-step"  would  at  least  guarantee  stability.  This  issue  will  be  discussed 
in  more  detail  in  §7.  In  spite  of  this,  their  method  provides  grid-independent  convergence 
factors  for  a  test  problem  similar  to  the  one  in  [11].  Second-order- accurate  results  with  Os¬ 
her’s  scheme  for  the  upwind  differencing,  van  Leer’s  technique  [20,21]  for  the  second-order 
accuracy,  and  the  Defect  Correction  technique  for  computing  the  solution,  are  presented  in 

[5,9]. 

An  application  with  strong  shocks  can  be  found  in  [12].  There,  the  relaxation  scheme  is 
symmetric  line  Gauss-Seidel,  with  the  line  relaxation  in  the  periodic  direction  of  the  polar 
grid.  Grid-independent  convergence  can  not  be  observed,  as  the  computations  are  restricted 
to  relatively  coarse  grids.  Three-dimensional  computations  with  the  FAS  scheme  and  flux- 
vector  splitting  have  been  carried  out  by  Anderson  [1].  The  relaxation  scheme  is  Approximate 
Factorisation  and  the  results  suggest  grid-independent  convergence  factors.  An  extension  of 
the  work  in  [11]  to  the  Navier-Stokes  equations  is  reported  in  [17].  Here  grid-independent 
convergence  factors  are  obtained  as  well.  A  later  example  is  [18]. 

In  summary,  there  is  experimental  evidence  that  the  combination  of  upwind  differencing 
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and  Symmetric  Gauss-Seidel  or  another  type  of  relaxation  scheme  is  capable  of  producing 
grid-independent  convergence  factors.  It  should  be  stressed,  however,  that  none  of  the  studies 
mentioned  above  are  extended  to  very  fine  grids.  Also,  there  are  practically  no  theoretical 
results  to  support  the  claims  of  grid-independent  convergence  rates. 

An  attempt  to  predict  multigrid  convergence  factors  for  purely  convective  equations  can 
be  found  in  [13].  There  the  one-dimensional  scalar  inviscid  Burgers’  equation  is  considered. 
Of  course,  it  does  not  make  much  sense  to  use  the  multigrid  method  for  one-dimensional 
problems.  However,  some  interesting  results  were  found.  First  of  all,  it  turns  out  that 
optimising  the  smoothing  rate  of  the  relaxation  scheme  does  not  necessarily  imply  a  good 
multigrid  convergence  rate.  Secondly,  the  discrete  equations  become  singular  at  the  shock. 
This  is  is  consistent  with  the  differential  equations.  With  a  special  treatment  of  shocks  after 
prolongation,  a  good  agreement  between  experimental  and  predicted  convergence  factors  is 
obtained.  Otherwise,  convergence  is  slower  than  predicted,  but  still  acceptable  for  some 
relaxation  schemes.  Damped  Point-Jacobi  relaxation  appears  to  be  the  most  attractive 
scheme. 

In  this  paper  we  extend  the  two-level  local-mode  analysis  to  two  dimensions.  Only  the 
linearised  Euler  equations  with  constant  coefficients  and  periodic  boundaries  are  considered 
(§2).  Nonlinear  effects  are  not  addressed  in  this  paper.  The  upwind  discretisation  is  described 
in  §3.  The  coarse-grid  correction  operator  is  evaluated  in  §4.  It  describes  the  result  of 
restriction  to  a  coarser  grid,  solving  the  coarse-grid  equations  exactly,  and  prolongating  the 
coarse-grid  correction  back  to  the  fine  grid.  Several  relaxation  schemes  are  considered  in  §5. 
As  in  the  one-dimensional  case  [13],  we  would  prefer  to  have  a  scheme  that  uses  only  local 
information,  as  these  schemes  are  easily  vectorised  and  are  more  convenient  to  use  on  parallel 
architectures.  Also,  their  flexibility  makes  them  better  suited  for  applications  with  adaptive 
grid-refinement.  Investigated  are:  Point-Jacobi  relaxation,  a  Multi-Stage  scheme,  Red-Black 
or  checkerboard  relaxation,  and  Block-Jacobi.  As  global  relaxation  schemes,  Gauss-Seidel 
relaxation  and  its  symmetric  variants  are  considered. 

Multigrid  convergence  factors  are  estimated  in  §6.  The  schemes  just  mentioned  fail 
because  of  strong  alignment,  the  flow  being  aligned  with  the  grid,  which  is  a  well-known 
problem  for  elliptic  equations  with  strongly  anisotropic  coefficients  [2,3].  For  pure  convection, 
this  problem  can  be  overcome  by  damped  symmetric  Gauss-Seidel  (SGS)  or  by  a  version  of 
Gauss-Seidel  with  sweeps  in  all  four  directions  (S2GS).  However,  these  global  relaxation 
scheme  still  fail  for  the  full  system  of  Euler  equations. 

Because  the  Fourier  modes  are  not  the  proper  eigenfunctions  of  the  GS  relaxation  op¬ 
erator,  some  numeric-*!  experiments  on  the  nonlinear  Euler  equations  are  carried  out  (§7). 
The  failure  of  S2GS  and  damped  SGS  is  confirmed. 

The  main  results  are  summarised  in  §8.  Some  alternatives  for  obtaining  uniformly  good 
convergence  rates  are  discussed. 


2.  Model  equations 

The  two-level  local-mode  analysis  [2]  will  be  carried  out  on  a  linearised  form  of  the  Euler 
equations.  These  equations  are  given  below. 

The  Euler  equations  in  conservation  form,  describing  the  dynamics  of  an  inviscid  com¬ 
pressible  gas,  are 

dw'  df  dg 


dt  dx  dy 


=  0. 


(2.1a) 
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The  vector  of  states  to'  and  the  fluxes  /  and  g  are 


( p ) 

f  pu  ' 

w'  = 

pu 

,  /  = 

pu 2  +  p 

pv 

puv 

\pE) 

<  puH  / 

9  = 


pv 

puv 

pv2+p 
V  pvH  ) 


(2.16) 


Here  p  is  the  density  of  the  gas,  and  u  and  v  are  the  x-  and  y-component  of  the  velocity, 
respectively.  The  energy  E,  total  enthalpy  H ,  pressure  p,  and  sound  speed  c  are  related  by 


E  = 


1 


-- +  5(w2  +  u2),  H  =  E  +  p/p,  c2  =  7p/ p. 


(7-1  )P 

Linearising  (2.1)  and  applying  a  similarity  transform  based  on 


(2.2) 


P  =  p 


we  obtain 


/0  0  1/c  -l/7 

i  o  u/c  ~uh 

0  1  v/c  —0/7 

\u  v  H/c  -±(u2  +  v2)/~( } 


dw  Adw  „du> 
~dt+Afa+Blfy=0' 


with  the  symmetric  matrices 


A‘p~'^p  = 


The  vector  to  obeys 


/  u 

0 

c 

0\ 

0 

u 

0 

0 

c 

0 

u 

0 

V0 

0 

0 

u) 

B=p-'Zp= 


/  v 
0  vc 
0  c  v 

Vo  0  0 


0  0  0\ 

v  c  0 

c  v  0 

v/ 


(2.3) 


(2.4a) 


(2.46) 


6w  =  P  16wl  = 


(  6u  > 

Sv 

&  ■  (2-5> 

\  / 

where  the  specific  entropy  S  =  log  (p/p7).  The  fourth  equation  of  the  system  (2.4)  describes 
the  convection  of  the  entropy  along  streamlines.  The  remaining  3x3  system  represents  the 
combination  of  convection  and  sound  waves.  In  the  isentropic  case,  the  fourth  equation  can 
be  dropped  and  the  third  component  of  w  (2.5)  becomes  2cj(f  —  1). 

The  matrix  kiA  +  k2B,  with  +  k\  =  1,  can  be  diagonalised: 


K\A  +  k2B  =  QAQ 


-l 


where 


Q  = 


(  K\  K2  0  «i  \ 

K2  —  *i  0  K2 
-10  0  1 
Vo  010/ 


(2.6a) 


(2.66) 
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f KiU  +  k2v-c  ^  \ 

A  =  Kiu  +  k2v  (2.6c) 

\  K1U  +  K2V  +  C/ 

In  the  following  sections  we  will  consider  the  linear  system  (2.4)  with  constant  coefficients 
and  periodic  boundary  conditions. 

3.  Discretisation 

The  spatial  discretisation  of  (2.4)  is  obtained  by  upwind  differencing.  The  upwind  differ¬ 
encing  is  carried  out  separately  for  the  x-  and  {/-direction.  For  each  characteristic  variable, 
the  upwind  direction  is  determined  from  the  eigenvalues  of  A  or  B.  The  resulting  residual 
operator  is  given  below.  The  singularities  of  its  symbol  are  listed  as  well. 

For  the  upwind  differencing  in  the  x-direction,  the  matrix  A  is  diagonalised  by 

A  =  QjAjQf1,  Qx  =  Q(k j  =  1  ,k2  =  0),  Aj  =  A(kx  =  1,  k2  =  0),  (3.1a) 

where  Ax  is  the  diagonal  matrix.  For  the  {/-direction  we  have 

B  =  Q2A2Q2 1>  Q2  —  Q(k  1  =  0,k2  =  1),  A2  =  A(/c,  =  0,  k2  =  1).  (3.16) 

We  define  A+  and  A-  as  the  matrices  that  contain  the  positive  and  negative  elements  of  A, 
respectively.  This  implies 


Now  define 


It  follows  that 


A+  +  A-  =  A,  A+  —  A"  =  |A|. 


A*  =  2?±  =  Q2A±Qj\ 


A  =  A+  +  A",  |A|  =  QjlAJQr1  =  -  A"; 

B  =  B+  +  B~,  |B|  =  QJA^Q,1  =  B+-B~.  1  ’ 

The  upwind-differenced  linear  residual  operator 

Lh  =  Ua+(  1  -  T-1)  +  A~(TX  -  1)]  +  ~[B+(  1  -  T;1)  +  B~(T  -  1)].  (3.5) 


The  shift  operators  Tx  and  Ty  are  defined  by  Txwkiky  =  wkl+lkj,  Tywkukj  =  iatlifcj+ ,.  Only 
a  uniform  grid  will  be  considered  ( hx  =  hy  =  h). 

The  steady  state  problem  is  written  in  terms  of  the  error  vh  =  wh  —  ivh ,  where  vf1  is  the 
stationary  solution.  The  Fourier  transform  of  vh  for  a,  Nk  x  N2  grid  is 


1  N1-1N2-1 

=  YY  E  E  exP  [-*  (M*  +  Mv)] , 

Jylly2  kx=  0  *2=0 


(3.6a) 


where  the  frequencies 

9r  =  2 0y  =  27r i2  =  -(|1V2  -  1), ,  \N2.  (3.66) 
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The  symbols  of  the  shift-operators  Tx  and  Ty  are 

Tx  =  exp(t 0X),  fy  =  exp(i^),  -x  <  0X  <  x,  -x  <  9y  <  x.  (3.7) 

Lemma  3.1.  The  linearised  residual  operator  Lh  is  singular  only  in  each  of  the  following 
cases: 

(i)  Tx  =  l,Ty  =  l 

pi)  Tx  ^  1,  Ty  =  1  :  u  =  -c  or  u  =  0  or  u  =  c; 

pii)  Tx  =  1,  Ty  7^  1  :  v  =  —  c  or  v  =  0  or  v  =  c; 

pv)  Tx^l,Ty^l:  u  =  v  =  0. 

Proof.  In  the  first  case,  the  linearised  residual  operator  Lh  =  0.  In  the  second  case  we  have 

hLk  =  A+(l-tx-')  +  A-(Tx-l),  (3.9) 

This  expression  can  be  diagonalised  by  Qlt  yielding  eigenvalues 

|A1(|(1  -cosflJ  +  zAusinfl,.,  /=1,...,4.  (3.10) 

This  expression  only  vanishes  if  A1(  =  0,  i.e.,  if  one  of  the  eigenvalues  of  A  vanishes.  The 

third  case  is  proven  in  the  same  way.  For  case  (iv)  we  write  the  linearised  residual  operator 

as 

hLh  =  |^4|(1  —  cosOx)  +  |B|(1  —  cos6y)  +  i(.4sin0r  +  BsinOy).  (3-11) 

A  necessary  condition  for  Lh  to  be  singular  is  that  its  real  part  have  a  zero  eigenvalue.  The 
matrix 

^\A\  +  h2\B\,  p,>0,  P2>0.  (3.12) 

is  singular  only  for  u  =  v  =  0.  It  is  easily  seen  that  Lh  is  also  singular  for  this  choice.  □ 

4.  Coarse-grid  correction  operator 

An  estimate  of  the  multigrid  convergence  rate  for  a  given  residual  operator  can  be  obtained 
by  considering  2  grids,  a  fine  and  a  coarse.  The  multigrid  convergence  factor  is  determined 
by  the  combination  of  relaxation  on  the  fine  grid  and  corrections  to  the  solution  from  the 
coarse  grid.  Here  we  will  describe  the  latter.  Attention  is  given  to  the  singularities  of  the 
coarse-grid  equations,  and  the  stability  of  the  coarse-grid  correction  operator. 

The  coarse-grid  correction  (CGC)  operator  K  describes  the  effect  of  the  following  se¬ 
quence  of  operations.  First  the  fine-grid  residual  Lhvh  is  restricted  to  the  coarser  grid.  The 
restriction  operator  is  written  as  Iff ,  where  H  =  2 h.  Next  it  is  assumed  that  the  steady- 
state  problem  on  the  coarser  grid  is  solved  exactly.  Finally,  the  coarse-grid  correction  is 
prolongated  back  to  the  fine  grid.  The  prolongation  operator  is  denoted  by  iff.  The  CGC 
operator  is  given  by 

K  —  I  ~  I H  lh  Ijh-  (4-1) 

Here  I  is  the  identity  operator.  Volume-averaging  is  adopted  for  the  restriction  operator, 
and  zero-order  interpolation  is  used  for  prolongation  [11]. 
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The  restriction  operator  introduces  a  coupling  between  the  frequencies  on  the  fine  grid: 
9X  is  coupled  with  0x+i r,  and  0y  with  6y  +  ir.  For  brevity  we  define 


fii+sOV  Tx,  Ty)=  v\ex  ), 

vt+=v\-fx ,  Ty)=vh(0x  +  x,6y  ), 
H-=«h(  Tx,-Ty)=  vh(0x  ,«„  +  *), 
vh__  =  v\-tx,-Ty)=  vh(0x  +  My  +  *). 

Here  vh  denotes  the  Fourier  transform  of  the  error  vh.  Furthermore, 


(4.2a) 


(4.26) 

a  vector  with  4x4  elements. 

Let  the  fine  grid  be  numbered  by  indices  (fcj,  fc2),  with  running  from  0  to  Nx  —  1,  and  k2 
from  0  to  JV2  —  1.  On  the  coarse  grid  we  can  use  the  same  indices,  but  now  fc,  =  0, . . . ,  | iVj  —  1 
and  k2  =  0, ... ,  |W2  —  1.  The  restriction  operator  coarsens  an  arbitrary  discrete  variable  ah 
to  aH  =  Ij ^ah  according  to 

akuk2  =  4(a2/t,.2lfc3  +  a2fc,+l,2fc2  +  a2ku 2t2+l  +  a2*i+l,2k2+l  )•  (4-^) 

The  Fourier  transform  of  the  restricted  fine-grid  residual  in  terms  of  waves  on  the  fine  grid 
is  given  by 


i?Lh  =  (R++  R_+  R+_  R__) 


(4.4a) 


where 

R++  =  R(tx,ty)  =  1(1  +  Tx){  1  +  ty).  (4.46) 

Here  the  convention  (4.2a)  is  used.  The  coarse-grid  residual  operator 

LH  =  ^+(1  -  t~2)  +  A-{t2x  -  1)  +  B+(  1  -  t;2)  +  B-(t2  -  1)1,  (4.5) 

which  can  be  obtained  by  Galerkin  coarsening  (I^LhI^)  or  by  direct  evaluation.  We  ignore 
the  singular  behaviour  of  LH  for  a  moment.  The  coarse-grid  correction  operator 


k  =  I~  b'+  (*++  *-+  Z+.  Z__), 


(4.6a) 
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where 

>++  =  }(i  +  r-l)(i  +  r;1),  z±±  =  R±±L±±.  (4.66) 

Note  that  the  prolongation  operator  is  the  conjugate  transpose  of  the  restriction  operator  in 
Fourier  space.  The  identity  matrix  I  in  (4.6a)  has  a  size  16  x  16.  The  CGC  operator  should 
be  applied  to  the  vector  Vk.  Because  K  operates  on  4  waves  simultaneously,  we  only  have 
to  consider  half  the  frequency  domain  in  each  direction,  i.e.,  0  <  6 x  <  ir  and  0  <  6y  <  n. 

If  LH  is  singular,  its  pseudo-inverse  should  be  used.  To  justify  this,  we  assert  the  follow¬ 
ing. 

Lemma  4.1.  Let  the  coasse-grid  residual  LH  and  the  restriction  Ri±  of  the  fine-grid  residual 
be  of  the  form  given  above.  Then  the  linear  system 

LhZ±±  =  R±±L±±  =  1(1  ±  t)(  1  ±  tv)l±±  (4-7) 

is  consistent. 

Proof.  If  Lh  is  not  singular,  then  this  is  trivial.  The  singularities  of  LH  correspond  to  those 
of  Lh,  if  (Tx, Ty)  in  the  latter  is  replaced  by  (Tx,T2).  The  singularities  in  Lh  are  listed  in 
Lemma  3.1.  In  the  first  case,  T2  =  T2  =  1  implies  /£i±L±±  =  0,  so  the  linear  system  (4.7) 
is  consistent.  In  the  second  case,  R±^L±_  =  0.  What  remains  reduces  to 

(T;2A+  +  A-)Z±+  =  (±f-M+  +  A-).  (4.8a) 


Diagonalisation  by  Qx  yields 

(t;2A+  +  Ar,)(Q71  Z±+  Qi)i  =  (±t_1^w  +  A  u),  /  =  1 . 4.  (4.86) 

This  is  singular  only  if  A1(  =  0,  for  which  (4.8b)  is  consistent.  Case  (iii)  is  proven  in  a 
similar  way. 

In  case  (iv)  we  have  to  consider  u  =  v  =  0.  Then  the  fourth  row  and  fourth  column  of 
the  left-hand  and  right-hand  side  of  (4.7)  have  zeroes,  implying  consistency.  The  remaining 
3x3  matrix  on  the  left-hand  side  has  a  determinant 

(c/2/i)3(1  -  T2)(l  -  I*-2) (1  -  T2)(l  -  t;2)  *  0,  (4.9) 

and  is  therefore  non-singular.  □ 


Lemma  4.2.  If  m  is  the  rank  of  LH ,  then  the  coarse-grid  correction  operator  has  m  zero 
eigenvalues,  and  16  —  m  eigenvalues  equal  to  1. 

Proof.  Let  the  matrix 


(P++ 

0 

0 

1/C+. 

0 

0 

°\ 

P-+ 

1 

0 

0 

,  Qk'  = 

1 

0 

0 

p+- 

0 

1 

0 

1 

1 

+ 

0 

1 

0 

\p~ 

0 

0 

ij 

K-p~/p„ 

0 

0 

1/ 

(4.10) 
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Note  that  QK  is  a  regular  matrix  for  0  <  0X  <  ir  and  0  <  6y  <  ir.  Because  QK  does  not 
contain  A*  or  B *,  we  can  carry  out  a  similarity  transform  as  if  K  was  just  a  4  x  4  matrix 
with  scalar  entries  rather  than  4x4  blocks.  The  result  is 


k'  =  Q-KlkQK  = 


0 

0 

0 


LH  -Z 
1 
0 
0 


-+ 


-z. 

0 

1 

0 


+- 


-z_ 

0 

0 

1 


(4.11) 


For  a  regular  LH,  we  obviously  have  12  eigenvalues  equal  to  1,  and  4  equal  to  0.  For  a 
singular  LH  of  rank  m  <  4,  the  use  of  the  pseudo-inverse  causes  [1  —  (LH)*LH J  to  have  m 
eigenvalues  0  and  4  —  m  eigenvalues  1.  □ 


5.  Relaxation 

In  this  section  several  relaxation  schemes  are  considered.  They  are  presented  in  a  form  that 
is  compatible  with  the  coarse-grid  correction  operator.  A  relaxation  scheme  is  constructed 
by  replacing  the  residual  operator  Lh  by  an  operator  Lh  that  can  be  easily  inverted.  Then 
the  error  is  updated  according  to 


Lh{ vh  -  vh)  =  -Lhvh. 


(5.1) 


In  the  following,  we  set  h  =  hx  =  hv  =  1 .  Some  useful  definitions  are: 


Lh  =  M0-  Mx-  M2)  M,  =  A+T~x  +  B+T;\ 
M0  =  \A\  +  | Bl,  M2  =  -A~Tr  -  B~TV. 


The  general  form  of  a  relaxation  operator,  acting  on  Vh  (4.2b),  is 


f  ^l.++ 

^2,++ 

^3.+  + 

^4.++  N 

^2,-+ 

Gr,-+ 

g4,-+ 

63, -  + 

^3,+  - 

G4,+- 

611+- 

^2,+- 

63,- 

g2.-_ 

61.-  / 

(5.3) 


We  start  with  schemes  for  which  G2  =  =  G±  =  0,  i.e.,  there  is  no  coupling  of  frequencies. 

The  simplest  relaxation  scheme  is  Point-Jacobi.  It  updates  the  error  according  to 


vh  =  [l  -  0M~'Lh]  vh,  (5.4a) 

implying 

G++  =  1  -  (3Mq1L++,  (5.46) 

The  other  G±J±  follow  by  the  convention  (4.2a).  The  parameter  /?  describes  the  amount  of 
under-  or  overrelaxation.  Standard  Point-Jacobi  is  obtained  for  /?  =  1.  The  scheme  is  stable 
for  0  <  j3  <  1,  so  overrelaxation  is  excluded.  The  matrix  M0  is  positive  semi-definite.  It 
becomes  singular  for  u  =  v  =  0.  In  that  case  the  pseudo-inverse  should  be  taken  for  A/J"1 . 
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Note  that  this  is  only  done  for  the  purpose  of  analysis.  A  different  approach  is  adopted  for 
the  numerical  experiments  in  §7. 

A  Multi-Stage  method  with  two  stages  is  obtained  by  first  performing  a  PJ-step  to  an 
intermediate  level,  and  then  using  the  residual  at  the  intermediate  level  to  make  the  full  step 
from  the  old  to  the  new  level: 


This  implies 


vhm  =vh  - 

vh  =  vh  -  /32Mo1Lhvh\ 


GMS  =  1  -  P2M~lLh  (l  -  ftA/o  lLh)  . 


(5.5a) 


Now  we  have  two  parameters  and  /32. 

We  proceed  with  the  schemes  that  introduce  a  coupling  between  frequencies.  Red-Black 
or  checkerboard  relaxation  is  a  scheme  that  first  performs  a  PJ-step  on  the  cells  with  indices 
(2kl,2k2)  and  (2kl  + 1, 2k2  +  1).  Then  new  residuals  are  computed,  and  the  cells  with  indices 
(2fcj  +  l,2fc2)  and  (2kx,2k2  +  1)  are  updated  by  a  PJ-step.  This  variant  will  be  denoted  by 
RBI.  The  variant  that  relaxes  in  the  opposite  order  will  be  called  RB2.  For  RBI: 

G^  =  1-/9M-'^++-G«bL, 

G%'±  =  6* fi  =  0,  G™\  =  1,32Mq1(M1 _ +  M2 _ )MolLh__.  1  ‘  } 

For  RB2  we  have 

ARB2  _  f,RB\  fyRBi  _  ARB2  _n  r^RB2  _  f'RBX  /R7, 

^l.ii  “  °1,±±>  ^"r2,±dk  -  '■J3,±±  —  U>  °4,±±  —  -(j'4,±±-  (5-7) 

The  distinction  between  RBI  and  RB2  becomes  important  if  the  relaxation  scheme  is  used 

in  combination  with  the  CGC  operator  (as  in  [13]). 

Another  relaxation  scheme  is  obtained  if  the  4  cells  contained  within  one  coarse-grid  cell 
are  relaxed  simultaneously,  ignoring  the  contributions  from  outside  the  4  cells.  This  method 
is  called  Block-Jacobi.  In  physical  space  we  have 


M0 

A~ 

B~ 

0 

-A+ 

M0 

0 

B~ 

-B+ 

0 

M0 

A~ 

0 

-B+ 

-A+ 

M0 

_  (yh  -  vh )  = 


(5.8a) 


where 


V2ki  ,2fc2 
V2ki+l,2k3 
v2ki  ,2*2  +  1 

^2*i  +1,2*2  +  ! 


(5.8b) 


In  Fourier  space 


sBJ  =  /  -  ^/r1 


(5.9a) 


Mulder 


477 


Here  H  has  the  same  structure  as  S  in  (5.3),  with  elements 

HXt++  =  M0-  \{MX  +  M2),  H2++  =  -i (A+T?  +  A-Tx\ 

H3i++  =  -UB+TV~'  +  B~TV),  tf4>++  =  0. 

Finally,  consider  Gauss-Seidel  relaxation.  This  is  a  global  relaxation  method,  in  contrast 
to  the  schemes  mentioned  above.  In  two  dimensions  there  are  four  sweep  directions.  The 
relaxation  operators  for  each  direction  are 

north-east  {/)  GGSl  =  1  -  0  [ih  -  A~TX  -  B~Ty  ]' 1  Lh, 

south-west  (/)  GGS2  =  l-p\Lh  +  A+T~l  +  B+f-'}~1  L\ 

1  v  J  (5.10) 


south-east  (\)  GGS3  =  1  -p[Lh-  A~TX  +  B+T;1}^  L\ 
north-west  (\)  GGS4  =  1  -  $  [Lh  +  A+Tj1  -  B~fy  ]  "l  Lh. 


Here  pseudo-inverses  should  be  used  if  necessary.  There  are  two  variants  for  Symmetric 
Gauss-Seidel  (SGS),  namely  G2GX  and  G4G3.  The  variant  that  sweeps  in  all  four  directions 
will  be  denoted  by  S2GS: 

GS2GS  =  G4G3G2Gv  (5.11) 

It  should  be  noted  that  this  is  not  the  correct  way  to  carry  out  the  analysis.  The  reason 
is  the  well-known  fact  that 

exp  [2iri  +  fc2^)]  »  (5.12) 

is  not  an  eigenfunction  of  the  relaxation  operator.  Therefore,  the  Fourier  analysis  is  not 
valid,  although  reasonable  estimates  may  still  be  obtained  for  all  but  the  longer  waves. 


6.  Multigrid  convergence  factors 

The  multigrid  convergence  factor,  also  known  as  the  asymptotic  convergence  rate,  is  given 

by 

As  max  A  (*„*,),  \{9x,6y)  =  p(S*  I<  S"') .  (6.1) 

The  maximum  is  taken  over  the  entire  spectrum.  The  spectral  radius  is  denoted  by  p(-)  The 
operator  describes  ux  pre-relaxation  sweeps  on  the  finest  grid,  restriction  to  the  next  coarser 
grid,  exact  solution  of  the  coarse-grid  equations,  prolongation  of  the  coarse-grid  correction 
to  the  finest  grid,  and,  finally,  v2  post-relaxation  sweeps.  In  this  section  we  will  only  consider 
the  choice  iq  =  1,  v2  =  0. 

For  a  singular  residual  operator,  X  can  become  1  for  waves  that  are  not  seen  by  the 
operator,  hence  are  not  damped.  In  that  case  one  better  considers  the  multigrid  convergence 
factor  of  the  residual 

Ar  =  maxAr(0r,0v),  A  T{6x,0y)  =  P{ihS^  K^'Cl")').  (6.2) 

This  quantity  can  be  observed  in  numerical  experiments  (cf.  [13]).  If  Lh  is  regular,  (6.1)  and 
(6.2)  provide  the  same  result. 

The  evaluation  of  Ar  is  considerably  simplified  by  the  following  theorem,  which  is  moti¬ 
vated  by  remarks  on  strong  alignment,  the  flow  being  aligned  with  the  grid,  in  [2]  (see  also 
[3:§10.1.1]). 
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Theorem  6.1.  Given  an  arbitrary  linear  residual  operator  with  constant  coefficients  and  a 
restriction  operator  of  the  form  (4.3).  Then  the  multigrid  convergence  factor  for  the  shortest 
wave  in  the  x-  or  y -direction  can  not  be  better  than  the  convergence  factor  for  this  wave  of 
the  relaxation  scheme  used. 

Proof.  Consider  the  shortest  wave  in  the  y-direction  (6y  =  x,  Ty  =  —1).  In  physical  space 
this  wave  is  described  by 

Vki,2ki  =  ~vki,2ki+l  =  Ufc1,2*j+2>  ^1  =  1)  •  •  •  )  —  1,  =  0,  1, .  .  .  ,  jf-^2  ~  (6-3) 

The  restriction  operator  I^h  causes  this  wave  to  vanish  on  the  coarser  grid 

vu  =  Ilhvh  =  0  (0y  =  x).  (6.4) 

For  an  arbitrary  linear  residual  operator  Lh  with  constant  coefficients  we  have 

llhLhvh  =  Lhllhvh,  (6.5) 

which  implies  that  the  coarse-grid  residual  vanishes.  As  a  result,  the  coarse-grid  correction 
operator  K  (4.1)  has  no  effect:  K  =  I.  The  convergence  factor  of  the  multigrid  scheme  for 
this  specific  wave  is  therefore  completely  determined  by  the  relaxation  scheme  used.  The 
same  is  true  for  the  shortest  wave  in  the  x-direction.  □ 

This  theorem  has  a  rather  unpleasant  consequence  for  the  application  of  any  multi¬ 
grid  method  to  the  differential  equations  under  study,  as  pointed  out  by  Brandt  [2:  §2.1, 
§3.3].  The  rule  of  thumb  in  designing  relaxation  schemes  is  that  they  must  remove  the 
high-frequency  part  of  the  error.  The  coarse-grid  correction  operator  will  take  care  of  the 
low  frequencies.  However,  we  run  into  problems  for  a  purely  convective  equation  like  the 
fourth  component  of  (2.4).  Because  convection  is  a  locally  one-dimensional  phenomenon,  a 
differential  operator  for  pure  convection  will  not  depend  on  the  structure  of  the  flow  field 
perpendicular  to  a  streamline.  Any  good  discretisation  of  this  operator  will  have  the  same 
property.  Suppose  that  a  streamline  is  aligned  with  one  of  the  grid-lines,  say  the  x-direction. 
According  to  the  rule  of  thumb,  the  relaxation  scheme  must  remove  the  high  frequencies,  also 
the  ones  perpendicular  to  the  streamline.  But  the  residual  corresponding  to  the  convection 
operator  will  not  depend  on  waves  in  this  direction,  so  they  can  not  be  removed  directly  by 
relaxation.  They  actually  must  remain  unaffected.  However,  for  problems  with  boundaries, 
the  boundary  data  will  require  the  error-components  in  the  perpendicular  direction  to  vanish. 
This  information  can  be  communicated  to  the  discrete  solution  only  by  a  relaxation  scheme 
acting  along  the  x-direction  (for  those  waves  that  have  a  high  frequency  in  the  y-direction 
and  can  not  be  represented  on  coarser  grids).  This  will  require  0(N1)  single-grid  iterations 
for  any  relaxation  scheme  that  uses  only  local  data.  Thus,  a  grid-independent  convergence 
factor  can  never  be  obtained. 

We  will  now  determine  lower  limits  of  Ar  for  the  relaxation  schemes  of  the  previous 
section.  Only  the  fourth  component  of  (2.4)  is  considered,  with  u  >  0  and  v  =  0,  i.e., 

=  J(l-T-‘),  (6.6) 

which  describes  one-dimensional  flow  along  the  x-axis.  We  assume  0y  =  x,  and  derive  p(S) 
for  the  various  relaxation  schemes. 
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Fig.  1.  Multigrid  convergence  factor  Xr(Bx,  Oy)  for  one  sweep  of  Red -Black  re¬ 
laxation  followed  by  a  coarse-grid  correction.  Parameters  used  are  u  =  v  =  0.5, 
c  =  1,  and  0  =  1.  The  instability  occurs  at  Ar(0,  |ir)  =  Ar(57r,0)  =  1.074.  The 
local  maximum  in  the  center  of  the  figure  Ar(|?r,|7r)  =  1.  The  figure  is  periodic 
modulo  x. 


For  Point- Jacobi  with  0  <  0  <  1,  we  obtain  an  eigenvalue  A  for  which 

| A|2  =  1  -  20(1  -  0)(1  -  cos  9X).  (6.7) 

This  implies  Ar  >  1  —  2x20(l  —  0)N f2  =  1  —  0(h2).  For  the  Multi-Stage  scheme,  we  find 
Ar  >  1  —  0(h2)  in  the  same  way. 

The  operator  SRB  for  Red-Black  relaxation  has  two  double  eigenvalues 

X±  =  l-0  +  \0T~2  [0  ±  \J  02  +  4T2(1  -  0)]  .  (6.8  a) 

For  0  =  1  this  results  in 

A_  =0,  A+  =  t;2,  (6.8  b) 

implying  Ar  >  1  for  all  0X.  For  0  <  0  <  1  and  for  the  long  waves  (0X  <C  1)  we  find 

|A_|  =  1  —  0(91),  |A+|  =  (1  -  0)2)[l  +  0(^)1-  (6-9) 

Thus,  Ar  >  1  —  0(h 2).  Red-Black  is  therefore  not  a  suitable  relaxation  scheme  if  grid- 
independent  convergence  is  desired.  The  situation  for  Red-Black  is  actually  worse.  As  a 
single-grid  scheme,  RB  is  stable  for  0  <  0  <  1.  Figure  1  shows  that  it  becomes  unstable  in 
combination  with  the  CGC  operator  for  0=1,  just  as  in  the  one- dimensional  case  [13]. 
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Block-Jacobi  relaxation  for  the  simplified  residual  (6.6)  has  two  double  eigenvalues 

A,  =  1  -  0,  A2  =  1  —  0(1  -  T~2).  (6.10a) 

For  0  =  1  we  have  the  same  result  as  in  (6.8b).  Otherwise 


|AJ  =  |A2|  =  [1-  20(1  -0)(1-  cos  2tf,)J*.  (6.106) 


For  the  long  waves  |A2|  ~  1  —  2/3(1  —  0)0*,  implying  Ar  >  1  —  0(h2).  Again  we  have  a  useless 
relaxation  scheme. 

It  is  clear  that  relaxation  schemes  that  use  only  local  data  can  never  provide  a  good 
multigrid  convergence  factor  with  the  restriction  and  prolongation  operator  considered  here. 

Gauss-Seidel  relaxation,  which  is  a  global  scheme,  may  be  expected  to  give  better  results, 
as  indicated  by  the  numerical  experiments  mentioned  in  the  introduction.  Indeed,  GS  is  a 
natural  scheme  for  purely  convective  equations,  if  the  sweep  direction  coincides  with  the  flow 
direction.  If  this  is  not  true,  then  GS  fails.  To  illustrate  this,  consider  the  fourth  equations 
of  the  system  (2.4),  with  u  >  0  and  v  >  0,  but  not  u  =  v  =  0.  The  corresponding  discrete 
residual  operator  is 

ih  =  £(i  -  r-1)  +  £(i  -  771).  (6.ii) 

Then,  for  0  =  1, 


G°SI  =  0, 

qGS3  _ 


vT~l 


«(i  -r-»)  +  w’ 


qGS2  _  vTy 


QGS4  = 


U  +  V 

ut:1 


u  +  t>(l  —  T-1) 


(6.12) 


GSl  follows  the  flow  and  is  an_exact  solver.  For  the  other  3  schemes,  we  obtain  an  estimated 
multigrid  convergence  factor  \T  >  1  by  setting  either  u  =  0  and  Tx  =  —  1,  or  v  =  0  and 
Ty  =  —1.  The  same  is  true  for  0  <  0  <  1.  Consequently,  GS  is  not  an  appropriate  relaxation 
scheme  for  arbitrary  flows. 

A  better  performance  might  be  expected  for  Symmetric  Gauss-Seidel,  given  the  exper¬ 
imental  results  mentioned  in  the  introduction.  For  the  residual  (6.11)  and  0  =  1  we  have 
qGSiqgsi  _  bu(.  t,he  other  combination  QGSiQGS^  results  in  Ar  =  1,  e.g.,  for  Tx  =  1  and 
v  — ►  0,  given  Ty  =  —1.  This  can  be  improved  by  underrelaxation,  using  0  =  \.  Still  better 
results  are  obtained  by  the  following  form  of  underrelaxation: 


north-east  (/*) 
south-west  (f) 
south-east  (\) 
north-west  (\) 


GaSl  =  1  -  [Lh  -  A~(  1  +TX  )-  B~(  1  +  Ty  )]■'  lh, 
GGS 2  =  1  -  [Lh  +  A+(l  +  T x')  +  B+(  1  +  f"1)] _1  Lh, 
GGS3  =  1  -\Lh-  A-(l  +  Tx  )  +  B+(  1  +  T-1)] _l  L\ 
GGS4  =  1  -  [Lh  +  A+(l  +  T-1)  -  B~(  1  -I-  Ty  )]'*  Lh. 


(6.13) 


These  expressions  are  obtained  by  subtracting  the  blocks  of  Lh  that  are  ignored  in  the  relax¬ 
ation  matrix  Lh  for  GS,  from  the  main-diagonal  of  the  relaxation  matrix.  For  the  residual 
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Fig.  2.  Multigrid  convergence  factor  A,(u,  v)  for  damped  Symmetric  Gauss-Seidel 
relaxation  on  a  64  x  64  grid  (c  =  1,  v  =  1). 


/Cr 

0 

-2 


2 

Fig.  a.  Single-grid  amplification  factor  kt(u,  v)  forS2GS,  showing  the  instability. 
The  values  shown  are  obtained  for  N\  =  Nj  =  64,  c  =  1,  and  v  =  1. 
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Fig.  4.  Multigrid  convergence  factor  Ar(u,  v)  for  S2GS,  obtained  for  N\  =  JV2  = 
64,  c  =  1,  and  v  =  1.  Bad  convergence  factors  are  obtained  near  or  at  the 
singularities  of  the  residual. 


(6.11)  it  turns  out  that  A r(u,v)  <  Multigrid  convergence  factors  based  on  GGS2GGS 1  for 
the  full  system  (2.4)  are  displayed  in  Fig.  2.  Bad  convergence  is  obtained  near  the  singu¬ 
larities  of  the  residual  operator  (3.8),  but  on  the  whole  the  convergence  factors  are  fairly 
good. 

There  remains  the  combination  of  four  sweeps  denoted  by  S2GS  (undamped).  This  is  an 
exact  solver  for  the  fourth  component  of  the  system  (2.4).  Also,  if  both  the  horizontal  and 
vertical  component  of  the  velocity  are  supersonic,  i.e.,  |u|  >  c  and  |u|  >  c,  S2GS  is  exact 
for  the  full  system  (2.4).  The  convergence  factor  will  therefore  be  determined  by  the  first 
3  equations  of  (2.4)  for  |u|  <  c  and/or  |u|  <  c.  Figure  3  shows  the  amplification  factor  for 
S2GS  without  the  use  of  multigrid.  The  quantity 

=  max  xr(^,^),  Kr(0x,Ov)  =  p(LhS(Lhy).  (6.14) 

Surprisingly,  this  scheme  is  unstable,  in  contrast  to  SGS.  The  instability  does  not  disappear 
for  damped  versions  of  S2GS.  Because  the  instability  occurs  for  the  longer  waves,  it  can  be 
overcome  by  the  CGC  operator,  but  only  if  v  =  iq  +  v2  =  1.  Applying  the  relaxation  scheme 
more  than  once  per  grid  per  cycle  causes  the  instability  to  appear  in  the  multigrid  scheme. 
Figure  4  shows  the  multigrid  convergence  factor  for  a  64  x  64  grid.  Bad  convergence  factors 
are  obtained  near  the  singularities  of  the  residual,  namely  for  u  ~  0  and  |v|  <  c,  for  |u|  ~  c 
and  |u|  <  c,  and  for  similar  expressions  with  u  and  v  interchanged. 

Thus  we  find  that  damped  SGS  and  S2GS  do  not  provide  uniformly  good  convergence 
rates.  However,  the  validity  of  this  conclusion  may  be  questioned,  as  Fourier  modes  are  not 
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the  proper  eigenfunctions  of  the  Gauss-Seidel  relaxation  operator.  Therefore,  some  numerical 
experiments  have  been  carried  out. 

7.  Numerical  experiments 

The  experiments  are  performed  for  flow  through  a  straight  channel,  with  inflow  at  the  left 
side  and  outflow  at  the  right.  The  grid  is  square  and  uniform.  Upwind  differencing  for 
the  full  system  of  nonlinear  Euler  equations  (2.1)  is  accomplished  by  van  Leer’s  flux-vector 
splitting  [22]  or  the  P-variant  of  Osher’s  scheme  [4,16].  For  the  flux-vector  splitting  (FVS) 
characteristic  boundary  conditions  are  used  at  the  inlet  and  outlet,  that  is,  the  characteristic 
variables  corresponding  to  x-direction  are  computed  from  the  free-stream  values  for  incoming, 
and  from  the  computational  domain  for  outgoing  characteristics.  From  these,  the  boundary 
values  and  the  full  flux  are  determined.  For  Osher’s  scheme  we  can  use  overspecification. 
The  reason  for  this  different  treatment  is  the  fact  that  van  Leer’s  FVS  does  not  have  the 
correct  eigenvalue  structure.  It  is  not  a  very  good  approximate  Riemann-solver,  because  it 
does  not  recognise  slip-lines.  Osher’s  scheme  automatically  provides  the  correct  switching 
between  incoming  and  outgoing  characteristics  at  the  inlet  and  outlet.  The  lower  and  upper 
walls  are  simulated  by  adding  an  extra  zone  with  reflected  state  quantities  (cf.  [11]).  The 
free-stream  values  are  chosen  to  be 


Poo  =  «oo=°»  Coo  =  l> 


(7.1) 


find  different  values  of  are  considered.  The  gas  constant  7  is  set  to  1.4.  As  initial 
conditions,  we  take  the  free-stream  values  and  add  random  noise  with  an  relative  amplitude 
of  0.1%. 

We  consider  both  the  Correction  Scheme  (CS)  and  the  Full  Approximation  Storage  (FAS) 
scheme,  with  a  coarsest  grid  of  size  lxl.  The  CS  is  used  to  solve  the  linear  system  arising 
from  the  application  of  the  Switched  Evolution/Relaxation  (SER)  method  [10,23]  to  the 
residual.  This  is  the  method  described  in  [11],  The  linear  system  is  given  by 


JL 

A t 


dr 

dw 


(w  —  w)  =  r(  w)  =  —L(w), 


(7-2  «) 


and  the  choice 


1 

At 


1 

- max 

*SER 


(  I  r*i  \ 


(7.26) 


changes  the  "backward  Euler”  scheme  (7.2a)  into  a  SER  scheme.  The  constant  eSER  controls 
the  relative  change  per  iteration  of  the  solution,  and  can  usually  be  set  to  1.  The  bias  6,  is 
given  by  h1  =  h4  =  0,  h2  =  h3  =  pc ,  and  prevents  division  by  zero. 

A  nonlinear  version  of  (7.2)  has  also  been  described  in  [11].  At  the  begin  of  a  FAS 
multigrid  cycle,  the  "time-step”  (7.2b)  is  computed.  Gauss-Seidel  relaxation  is  carried  out 
using  a  relaxation  matrix  that  is  computed  locally  on  each  grid  and  destroyed  once  used. 
This  matrix  is  a  modification  of  M0  (5.2): 


Mq  =  —  +  M0  or 


Kkt* 


At 


dwkuh 


(7.3) 


After  the  local  linear  system  has  been  solved,  the  residual  is  updated  nonlinearly.  A  genuinely 
nonlinear  relaxation  scheme  can  be  obtained  by  using  (7.3)  several  times  per  point  until  the 
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Fig.  5.  Multigrid  convergence  {Actor  \T{u,v)  {or  S2GS  on  a  64  x  64  grid  with 
c  =  1  and  v  =  1,  using  van  Leer’s  flux-vector  splitting. 

local  nonlinear  residual  vanishes.  Here  we  will  consider  only  one  iteration  per  point.  More 
iterations  may  be  needed  near  shocks  and  sonic  lines  (cf.[13]). 

The  multigrid  scheme  described  by  Hemker  and  Spekreijse  [4]  is  similar  to  the  nonlinear 
one  proposed  in  [11]  but  for  the  "time-step”  (7.2b).  We  have  included  this  scheme  in  the 
experiments  by  leaving  out  the  1/A t  term.  In  all  cases,  W-cycles  are  used,  with  one  pre¬ 
relaxation  sweep  (i/j  =  1,  i/2  =0). 

Apart  from  channel  flow,  periodic  boundaries  conditions  on  all  four  sides  of  the  domain 
are  considered  as  well.  In  that  case,  a  stationary  solution  can  generally  not  be  reached  by 
time-accurate  integration  from  arbitrary  initial  data.  The  artificial  viscosity  provided  by 
the  upwind  differencing  allows  for  convergence  to  a  not  necessarily  unique  numerical  steady 
state.  For  the  examples  presented  below,  FVS  converges  to  the  free-stream  values  if  global 
conservation  of  the  initial  date  is  imposed.  In  the  FAS  scheme  this  is  carried  out  on  the 
coarsest  lxl  grid;  for  the  Correction  Scheme  this  is  done  on  the  finest  grid  at  the  end  of 
every  multigrid  cycle.  Osher’s  scheme  does  not  provide  a  unique  solution  for  the  examples 
considered,  but  the  convergence  factor  of  the  residual  can  still  be  monitored. 

The  matrices  of  Eq.(3.3)  can  be  used  to  predict  the  convergence  factor  for  Osher’s  scheme, 
but  not  for  van  Leer’s  flux-vector  splitting.  For  the  latter,  two-level  estimates  of  Ar  now  have 
to  be  computed  from  the  matrices 

a±=p"I£p’  Bt‘p"^P  (7-4) 

If  |u|  <  c  (or  |v|  <  c),  these  matrices  are  different  from  those  in  Eq.(3.3).  The  corresponding 
residual  operator  has  the  same  singularities  as  listed  in  Lemma  3.1  with  the  exception  of  the 
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one  at  u  =  0  in  case  ( ii ),  the  one  at  v  =  0  in  case  (Hi),  and  it  =  v  =  0  in  case  (iv).  FVS  does 
not  recognise  the  eigenvalue  u  (or  v)  going  through  zero.  The  result  is  a  considerable  amount 
of  numerical  viscosity  around  slip-lines,  which  actually  helps  to  improve  the  smoothing  factor 
because  the  effect  of  strong  alignment  is  reduced.  The  multigrid  convergence  factors  for  S2GS 
are  shown  in  Fig.  5.  The  additional  smearing  at  u  =  0  or  v  =  0  removes  the  problem  of 
strong  alignment,  but  convergence  is  still  bad  near  the  remaining  singularities. 

We  proceed  with  the  numerical  experiments.  Convergence  factors  are  based  on  the 
Z^-norm  of  the  residual.  First  consider  the  single-grid  instability  of  S2GS.  Table  1  shows 
the  single-grid  amplification  factor  7er  for  u/c  =  1.005.  The  instability  can  be  seen  in  the 
experiments  with  Osher’s  scheme,  but  it  is  not  as  strong  as  predicted.  For  FVS,  the  instability 
is  less  pronounced  and  does  not  show  up  in  the  numerical  experiments.  It  is  likely  to  appear 
on  still  finer  grids,  or  for  other  velocities.  The  SER  option  has  been  not  been  used  for 
the  results  of  Table  1  ( eSER  — *  oo).  Applying  the  SER  scheme  (eSEfl  =  1)  suppresses 
the  instability,  although  convergence  is  not  obtained.  Results  for  the  linear  and  nonlinear 
relaxation  scheme,  the  latter  with  only  one  linear  iteration  per  cell,  are  practically  identical 
for  this  simple  test  problem. 

Next  we  study  the  effect  of  the  singularity  (iii)  of  Lemma  3.1  for  v  =  0  and  0  <C  |u|  c, 
a  case  of  strong  alignment.  Only  Osher’s  scheme  is  affected.  FVS  is  not  singular  for  the 
given  velocities.  Table  2  shows  the  results  of  the  numerical  experiments,  using  damped  SGS 
relaxation.  Convergence  factors  with  the  CS  and  FAS  approach  are  practically  identical, 
with  or  without  the  SER  scheme. 

Another  singularity  is  the  one  at  u  =  c  and  v  =  0.  Here  Osher’s  scheme  and  FVS  have 
practically  the  same  convergence  factors,  around  0.90.  The  results  are  again  almost  identical 
for  the  linear  and  nonlinear  multigrid  scheme,  without  or  with  the  SER  option.  However, 
there  is  one  exception.  For  Osher’s  scheme  and  channel  flow,  the  FAS  scheme  without  the 
SER  option  produced  negative  densities,  causing  the  computer  program  to  stop.  Including 
the  SER  option  removed  this  problem.  The  Correction  Scheme  converged  normally.  The 
divergence  of  the  FAS  scheme  is  caused  by  the  coarsest  grid  (1  x  1),  where  the  relaxation 
matrix  becomes  singular.  The  SER  option  suppresses  this.  The  Correction  Scheme  does  not 
recognise  negative  densities  or  energies  during  the  multigrid  cycle,  and  can  therefore  handle 
fairly  large  changes  in  the  solution,  as  long  as  the  fined  corrections  to  the  state  quantities 
remain  reasonable. 

Finally,  consider  the  singularity  at  u  =  v  =  0.  The  differential  equation  does  not 
have  a  unique  steady  state  in  this  case.  Van  Leer’s  FVS  still  provides  a  unique  numerical 
solution  because  of  the  additional  numerical  viscosity.  Table  3  shows  convergence  results  on 
a  128  x  128  grid.  The  computer  program  stopped  in  several  instances  because  of  negative 
densities  or  strong  divergence.  No  problems  were  encountered  with  the  SER  scheme. 

The  numerical  experiments  confirm  the  failure  of  damped  SGS  near  the  singularities 
of  the  residual  operator.  Similar  results  are  found  for  S2GS.  In  practical  applications,  the 
singularities,  such  as  a  shock  or  sonic  line,  will  occur  only  an  a  small  subset  of  the  computa¬ 
tional  domain,  so  the  overall  convergence  rate  will  be  fairly  good.  However,  the  singularity 
at  v  =  0  will  still  result  in  slow  convergence  if  the  flow  is  aligned  with  the  grid  over  a  large 
part  of  the  computational  domain. 
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«oo  =  1.005 

Osher 

FVS 

Nt  =  Na 

Kr 

periodic 

channel 

K, 

periodic 

channel 

16 

0.955 

0.91 

■ 

0.90 

32 

0.960 

0.87 

0.89 

64 

1.944 

1.19 

1.01 

0.89 

0.88 

128 

3.964 

1.28 

1.16 

1.231 

0.97 

0.87 

Table  1 .  Single-grid  amplification  factors  for  S2GS  on  grids  of  various  sizes,  showing  the  instability 
of  the  scheme.  The  result  of  the  local- mode  analysis  is  denoted  by  /cr.  The  other  values  are 
determined  from  numerical  experiments  on  the  full  system  of  nonlinear  Euler  equations,  for  periodic 
boundary  conditions  and  for  channel  flow.  The  linear  and  nonlinear  relaxation  scheme  provide 
practically  identical  results  for  the  test  problem  considered.  The  SER  option  has  not  been  used. 


Uoo  =  0.200 

Osher 

FVS 

n1  =  n2 

Ar 

periodic 

channel 

Ar 

periodic 

channel 

16 

0.84 

WMM 

WEsm 

32 

0.857 

0.90 

ESI 

Bf9G 

KBi 

64 

■Jill 

0.94 

0.92 

■PH 

0.63 

IBSfl 

128 

0.93 

0.92 

0.63 

mem 

Table  2.  Multigrid  convergence  factors  for  damped  SGS.  The  result  of  the  local-mode  analysis  is 
denoted  by  Ar.  This  table  illustrates  the  sensitivity  of  Osher’s  scheme  to  the  singularity  at  v  =  0. 
Results  for  the  CS  and  the  FAS  scheme  are  practically  identical,  without  or  with  the  use  of  the 
SER  scheme. 


Uoo  —  Uoo 

=  0 

Osher 

FVS 

MG  scheme 

SER 

periodic 

channel 

periodic 

channel 

FAS 

no 

- 

- 

- 

CS 

no 

0.91 

- 

B&H 

- 

FAS 

yes 

0.90 

0.90 

CS 

yes 

0.90 

- , - 

0.90 

»  M 

0.73 

Table  3.  Multigrid  convergence  factors  for  damped  SGS  on  a  128  x  128  grid,  showing  the  effect 
of  the  singularity  at  u  =  u  =  0.  Tne  predicted  convergence  rate  is  Ar  =  0.990  for  Osher’s  scheme 
and  Ar  =  0.581  for  van  Leer’s  flux-vector  splitting.  The  dashes  indicate  cases  of  divergence.  The 
SER  option  is  obviously  required  for  stability. 

8.  Conclusions  and  discussion 

The  one-dimensional  character  of  convection  makes  a  multigrid  method  with  a  restriction 
operator  that  combines  four  cells  or  points  on  the  fine  grid  to  one  on  the  coarser,  ineffective. 
Grid-independent  convergence  rates  can  only  be  obtained  if  the  relaxation  scheme  has  a  good 
damping  rate  for  the  entire  spectrum,  including  the  long  waves.  Relaxation  schemes  that 
use  only  local  data,  such  as  Point-Jacobi,  Multi-Stage  schemes,  Red-Black  relaxation,  or 
Block-Jacobi,  are  thereby  disqualified.  This  conclusion  is  merely  a  restatement  of  a  problem 
described  by  Brandt  [2],  who  calls  it  strong  alignment,  the  flow  being  aligned  with  the  grid. 

A  somewhat  surprising  result  is  the  failure  of  symmetric  Gauss-Seidel  (damped  SGS 
or  S2GS)  relaxation,  a  global  scheme.  Earlier  numerical  experiments  (§1)  indicate  grid- 
independent  convergence  factors  already  for  undamped  SGS.  Furthermore,  as  a  single-grid 
scheme,  S2GS  is  exact  for  a  purely  convective  equation,  and  also  for  the  full  system  that 
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represents  the  Euler  equations,  if  both  the  horizontal  and  vertical  velocity-component  are 
supersonic.  Therefore,  one  would  expect  S2GS  to  overcome  the  problem  of  strong  alignment. 
However,  local-mode  analysis  shows  that  there  are  still  waves  that  give  rise  to  bad  conver¬ 
gence  factors  near  the  singularities  of  the  residual  operator,  both  for  damped  SGS  and  S2GS. 
This  is  confirmed  by  the  numerical  experiments  on  the  nonlinear  Euler  equations  described 
in  §7,  where  the  nonlinear  upwind  differencing  is  accomplished  by  Osher’s  scheme  or  van 
Leer’s  flux-vector  splitting. 

For  practical  purposes,  damped  SGS  can  still  be  useful,  depending  on  the  kind  of  sin¬ 
gularities  in  the  flow  field.  Even  S2GS  can  be  used,  although  its  instability  as  a  single-grid 
scheme  will  not  make  it  robust.  The  singularity  at  the  sonic  line  will  usually  be  confined  to 
a  small  subset  of  the  computational  domain,  so  that  the  overall  convergence  factor  will  be 
fairly  good.  The  singularity  at  v  ~  0  (or  u  ~  0)  will  cause  problems  if  the  flow  is  aligned 
with  the  grid  over  a  large  part  of  the  domain.  Such  a  situation  is  bound  to  occur  in  channel 
flow.  Indeed,  Koren  [8]  observes  slow  convergence  in  precisely  this  setting.  FVS  does  not 
suffer  from  this  problem,  at  the  expense  of  a  lower  spatial  accuracy. 

The  numerical  experiments  show  that  the  linear  or  nonlinear  SER  multigrid  scheme 
proposed  in  [11]  is  more  robust  than  its  non-SER  variant  advertised  by  Hemker  and  Spekreijse 
[4].  Two  of  the  relaxation  schemes  suggested  in  their  paper,  Red-Black  and  S2GS,  are 
unstable  according  to  the  analysis  of  §6.  Red- Black  is  stable  as  a  single-grid  scheme,  but 
becomes  unstable  when  used  in  a  multigrid  code,  whereas  the  opposite  happens  for  S2GS. 

We  will  now  discuss  some  alternatives  that  may  lead  to  a  uniformly  good  convergence 
rate.  The  main  ideas  can  be  found  elsewhere,  in  various  contexts  [2:§3.3,  3:§10.5]. 

First,  artificial  viscosity  can  be  added  to  remove  the  one-dimensional  character  of  the 
residual  operator  for  pure  convection.  This  approach  is  recommended  by  Brandt  [2].  It  is 
expected  that  a  fairly  large  amount  is  required,  causing  a  degradation  of  the  spatial  accuracy. 
The  latter  may  be  avoided  by  Brandt’s  double  discretisation.  It  should  be  noted  that  upwind 
schemes  have  a  built-in  amount  of  artificial  viscosity,  which  here  turns  out  to  be  insufficient 
for  good  convergence  rates.  More  viscosity  is  added  implicitly  by  van  Leer’s  flux-vector 
splitting  [22],  as  this  scheme  is  not  a  very  good  approximate  Riemann  solver,  but  this  still 
does  not  suffice  to  obtain  uniformly  good  convergence  rates. 

Secondly,  one  could  consider  more  powerful  global  relaxation  schemes.  Several  options 
can  be  considered,  such  as  Line-Jacobi,  Line-Gauss-Seidel,  zebra  relaxation,  Incomplete  LU 
decomposition,  or  its  line-variant.  Some  of  these  schemes  have  been  studied  for  single- 
grid  relaxation  in  [23].  It  may  be  possible  to  design  a  scheme  that  provides  a  good  grid- 
independent  convergence  factor  at  O(N)  cost  even  without  multigrid  (N  =  NlN2  being  the 
total  number  of  cells  or  points).  In  [14]  it  is  shown  that  damped  Alternating  Direction  Line- 
Jacobi  has  a  multigrid  convergence  factor  Xr(u,v)  <  0.526.  For  most  values  of  u  and  u,  we 
have  A (u,v)  ~  ~.  The  damping  is  obtained  in  the  same  way  as  in  (6.13).  The  relaxation 
matrix  is  obtained  by  taking  the  main  diagonal  of  the  linearised  residual  operator  and  2 
off-diagonals  in  one  direction.  The  2  other  off-diagonals  are  then  subtracted  from  the  main 
diagonal,  thus  leaving  a  tridiagonal  system.  Fina'ly,  a  SER  ”time-step”  must  be  added  to 
the  main  diagonal.  Here  a  diagonal  element  is  understood  to  be  a  4  x  4  block. 

As  a  third  alternative,  a  different  type  of  coarsening  can  be  considered  that  reflects  the 
one-dimensional  character  of  convection.  Instead  of  4  cells,  or  points,  we  can  combine  2, 
thereby  doubling  the  number  of  grid-levels.  The  direction  of  coarsening  can  be  alternated 
when  going  to  progressively  coarser  grids.  The  last  option  remains  to  be  explored.  Hopefully, 


488 


The  Euler  Equations  of  Gas  Dynamics 


grid-independent  convergence  rates  can  be  obtained  without  global  relaxation  schemes  in  this 
setting.  Conceptually,  it  should  not  be  necessary  to  use  these  global  schemes,  because  the 
coarser  grids  must  take  care  of  global  errors.  As  a  first  result  we  mention  that  damped 
Line-Jacobi  and  semi-coarsening  result  in  an  two-level  multigrid  convergence  factor  Ar  =  | 
for  the  linearised  Euler  equations  with  constant  coefficients.  The  line  relaxation  must  be 
carried  out  in  one  direction  and  the  coarsening,  of  2  cells  at  the  time,  in  the  other  direction. 

The  effects  of  nonlinear  singularities,  such  as  shocks,  have  not  been  considered  in  this 
paper.  They  may  give  rise  to  additional  complications,  just  as  in  [13].  At  this  point,  however, 
is  appears  to  be  difficult  enough  to  find  a  good  scheme  for  the  linear  case. 
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1.  INTRODUCTION 

The  Navier  Stokes  Computer  project  was  begun  by  Princeton  University  and  NASA 
Langley  Research  Center  in  mid-1984.  Its  goal  has  been  the  design  and  practical  demon¬ 
stration  of  a  scalable,  local  memory,  parallel  processing  supercomputer  constructed  from 
conventional  chip  technology.  The  original  focus  was  on  a  special-purpose  machine  which 
would  be  dedicated  to  computations  of  large,  time-consuming  three-dimensional  problems 
in  fluid  dynamics;  hence,  the  designation  Navier-Stokes  Computer  (NSC).  Of  particular 
interest  were  two  broad  categories  of  problems:  conventional  steady-state  aerodynamics 
and  time-dependent  simulations  of  transition  and  turbulence.  However,  the  actual  de¬ 
sign  that  has  emerged  [1]  places  the  Princeton/NASA  NSC  quite  close  to  the  realm  of 
general-purpose  supercomputers. 

A  major  application  of  Computational  Fluid  Dynamics  algorithms  is  the  study  of  flow 
about  an  aircraft  travelling  at  transonic  or  supersonic  speeds.  The  calculations  are  based 
on  local  discretizations  (finite-difference,  finite-volume  or  finite-element)  of  either  the  Euler 
equations  or  the  Reynolds-averaged  Navier-Stokes  equations.  (The  range  of  length  scales 
present  in  such  turbulent  flows  is  far  too  extreme  to  be  resolved  in  direct  solutions  of  the 
Navier-Stokes  equations.)  The  basic  computational  task  is  the  solution  of  a  large  system 
of  non-linear  algebraic  equations  of  mixed  type  (elliptic-hyperbolic).  A  wide  variety  of 
iterative  solution  schemes  have  been  devised.  These  typically  involve  point  relaxation 
with  multigrid  acceleration  or  sophisticated  line  relaxation  techniques  which  include  LSOR 
and/or  ADI  as  basic  components.  A  representative  selection  of  modern  algorithms  can  be 
found  in  [2j. 

Time-dependent  simulations  of  transition  and  turbulence  have  aimed  to  elucidate  the 
basic  physics  of  these  complex  phenomena  and  to  aid  in  the  development  of  effective 
engineering  models.  With  few  exceptions  these  simulations  have  been  restricted  to  simple 
geometries  such  as  the  flat  plate  or  even  the  fully  periodic  box.  Moreover,  they  have 
essentially  been  confined  to  incompressible  flow.  The  most  effective  numerical  techniques 
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for  these  calculations  on  conventional  supercomputers  are  global,  spectral  methods.  A 
brief  summary  of  these  methods  is  provided  by  Hussaini  and  Zang  [3]  and  an  exhaustive 
discussion  is  furnished  by  Canuto,  et.al.  [4].  For  these  applications  time  accuracy  for  the 
non-linear  convective  terms  requires  a  time-step  smaller  than  their  explicit  stability  limit. 
The  pressure  gradient  and  perhaps  the  diffusion  terms  are  handled  implicitly.  The  linear 
equations  which  need  to  be  solved  at  every  time-step  are  just  the  Poisson  and  (positive 
definite)  Helmholtz  equations. 

For  the  simplest  problems,  direct  solution  methods  are  the  most  efficient  means  to 
solve  the  implicit  equations.  Recently,  however,  Erlebacher,  Zang  and  Hussaini  [5]  have 
demonstrated  that  spectral  multigrid  methods  can  be  used  to  extend  the  class  of  problems 
which  are  amenable  to  global  discretization.  Nevertheless,  it  is  not  yet  clear  whether  spec¬ 
tral  methods  retain  their  advantages  over  say,  finite-difference  methods,  on  local  memory 
parallel  processors.  The  much  greater  communication  demands  of  the  global  discretization 
may  well  tip  the  balance  in  favor  of  the  less  accurate,  but  simpler  local  discretizations. 
For  this  reason  low-order  finite-difference  methods  have  been  the  focal  point  of  our  ini¬ 
tial  investigations  of  how  well  transition  and  turbulence  algorithms  can  exploit  the  unique 
architecture  of  the  Navier-Stokes  Computer. 

The  purpose  of  this  paper  is  to  describe  in  some  detail  the  current  NSC  architecture,  to 
describe  an  elementary  algorithm  (which  includes  a  multigrid  component)  for  simulations 
of  isotropic  turbulence,  to  explain  how  this  algorithm  would  be  implemented  on  the  NSC, 
and  to  assess  its  performance. 

2.  ARCHITECTURAL  OVERVIEW  OF  THE  NSC 

The  NSC  is  a  multi-purpose  parallel-processing  supercomputer  which  is  designed  to 
perform  numerical  simulations  of  a  wide  variety  of  large,  numerically  intensive,  complex 
scientific  problems.  Rapid  solution  of  these  problems  is  attained  through  a  global  archi¬ 
tecture  which  distributes  the  computations  over  a  fairly  small  number  of  powerful  local 
memory  parallel  processers,  called  Nodes.  Each  Node  has  the  performance  of  a  Class 
VI  supercomputer  such  as  a  Cray  XMP  or  Cyber  205.  The  current  projection  for  a  64 
Node  NSC  is  a  storage  capacity  in  excess  of  32  Gwords,  and  a  peak  speed  in  excess  of 
40  GFLOPS.  Due  to  the  modular  design  of  the  NSC  Nodes,  upgrading  of  the  hardware 
components  is  relatively  easy.  Hence,  the  memory /speed  characteristics  of  the  final  design 
may  differ  from  the  characteristics  detailed  herein,  which  are  based  on  the  utilization  of 
1986  technology. 

2.1  Global  Architecture 

The  global  architecture  of  the  NSC  involves  the  interconnection  of  the  Nodes  through 
both  a  global  and  a  local  communication  network.  The  global  network  utilizes  a  global 
bus  to  link  the  entire  Node  array  to  a  front-end  computer.  Although  the  global  bus  is 
primarily  used  to  transfer  data  and  commands  between  the  front-end  and  the  Nodes,  it 
may  also  be  invoked  for  the  transfer  of  data  between  any  two  Nodes  of  the  array.  The 
front-end  is  a  general-purpose  computer  which  provides  the  operating  environment  for  the 
NSC.  In-line  it  is  used  to  load  data  and  commands  to  the  Nodes  and  to  monitor  the  Node 
array.  Off-line  it  provides  program  development,  work  station  support,  and  data  analysis 
capabilities. 
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The  local  communication  path  consists  of  a  hypercube  network  [6|.  Intemode  commu¬ 
nication  links  for  it  are  implemented  with  fiber-optic  transmission  lines,  providing  data 
transmission  rates  orders  of  magnitude  faster  than  those  provided  by  the  global  bus.  Con¬ 
sequently,  most  (if  not  all)  intemode  data  transfers  for  an  algorithm  are  routed  through  the 
hypercube  network.  A  schematic  of  the  global  architecture,  where  a  subset  of  the  Nodes 
and  a  simple  2-D  nearest  neighbor  interconnect  network  are  illustrated,  is  presented  in 
figure  1. 

2.2  Nodal  Architecture 

The  Nodal  architecture  is  designed  to  permit  parallel  operation  of  both  the  memory  and 
the  computational  units,  providing  large  throughput  rates  for  a  wide  variety  of  computa¬ 
tional  procedures.  Architecturally  central  to  this  operation  are  the  Multiplane  Interleaved 
Memory,  MASNET  Cache  Router,  Double  Buffered  Cache,  Floating-point/Logical-unit 
Organizational  NETwork  (FLONET),  and  the  Arithmetic  and  Logical  Unit  (ALU).  The 
interconnection  of  these  devices  is  illustrated  in  figure  2.  Computations  are  begun  by 
accessing  multiple  operand  vectors  from  the  memory  planes,  and  routing  sequential  ele¬ 
ments  of  these  vectors  through  the  MASNET  Cache  Router  and  Double  Buffered  Cache 
to  FLONET.  In  FLONET,  which  consists  of  a  series  of  nonblocking  switch  networks,  the 
operand  vectors  are  routed  to  the  appropriate  input  ports  of  the  ALU.  The  ALU  itself 
consists  of  sixteen  independent  ALU  functional  units.  Operands  for  a  particular  ALU 
functional  unit  are  accessed  from  FLONET,  while  results  from  that  unit  are  routed  back 
to  FLONET.  If  the  result  is  an  intermediate  one,  it  is  routed  through  FLONET  to  the 


Figure  1.  Overall  Layout  of  the  Navier-Stokes  Computer. 
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To  hyperspace 
router 


Figure  2.  Schematic  of  the  Nodal  Architecture. 


input  port  of  the  next  ALU  functional  unit  in  the  pipeline,  where  further  processing  occurs 
and  the  new  result  is  routed  back  to  FLONET.  In  this  manner,  a  number  of  ALU  func¬ 
tional  units  may  be  interconnected  to  form  the  required  parallel  processing  vector  pipeline 
trees.  Final  results  from  the  ALU  are  then  routed  out  of  FLONET,  through  the  Double 
Buffered  Cache,  and  stored  in  the  Multiplane  Interleaved  Memory. 

The  Multiplane  Interleaved  Memory  consists  of  sixteen  128  Mbyte  memory  planes,  giv¬ 
ing  each  Node  a  local  memory  of  512  Mwords  for  32-bit  words.  Elements  of  the  vectors 
are  stored  in  and  accessed  from  the  memory  planes  using  either  constant  stride  or  scatter- 
gather  addressing.  Each  of  the  memory  planes  has  complete  address  translation  capabilities 
to  provide  full  scatter-gather  capability.  The  address  translation  information  is  stored  in 
high-speed  look-up  tables.  In  addition  to  being  linked  to  the  MASNET  Cache  Router,  the 
memory  planes  are  connected  to  the  Memory  Iuterplane  Vector  Routing  Unit  (MASNET). 
It  consists  of  52  32-bit  switch  elements  which  are  configured  to  form  a  16x16  nonblocking 
switch.  This  switch  network  is  used  to  transfer  memory  contents  between  memory  planes 
by  broadcasting  vector  streams  of  data  from  one  memory  plane  to  several  others,  transfer- 
ing  vector  streams  of  data  between  two  memory  planes,  or  shuffling  words  of  data  between 
memory  planes. 

The  MASNET  Cache  Router  is  used  to  route  vectors  between  the  memory  planes  and 
the  Double  Buffered  Cache.  It  consists  of  two  16x16  MASNET  type  switches,  one  of  which 
is  used  to  route  the  operand  vectors  from  memory  to  the  cache,  the  other  of  which  is  used 
to  route  the  result  vectors  from  the  cache  to  memory.  This  routing  procedure  provides  a 
reconfigurable  means  for  routing  data  from  any  memory  plane  to  any  cache  in  the  Double 
Buffered  Cache,  and  vice  versa.  The  MASNET  Cache  Router  is  also  intimately  involved 
in  the  intemode  communication  process,  as  discussed  later  in  this  section. 
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The  Double  Buffered  Cache  consists  of  16  independent  8  Kword  write-thru  double- 
buffered  cache  planes.  Typically,  once  the  switch  states  of  the  MASNET  Cache  Router 
have  been  set,  each  cache  plane  is  associated  with  a  particular  memory  plane.  Essentially 
the  caches  allow  a  word  to  be  read  from  and  written  to  each  memory  plane  each  clock 
cycle.  This  process  allows  for  vector  operations  such  as  A=A+B  to  be  performed,  where 
the  updated  values  of  vector  A  are  stored  in  the  memory  locations  previously  occupied  by 
A. 

The  ALU  consists  of  sixteen  independent  ALU  functional  units,  which  are  configured 
from  sixteen  floating-point  processing  units  (FPs),  sixteen  floating-point/integer/logical 
units  (FPILs),  and  eight  Multiplexers  (MUXs).  The  FPs  perform  floating-point  addition, 
subtraction,  multiplication,  floating  to  fixed-point,  or  fixed  to  floating-point  operations. 
Upon  performing  the  operation,  the  unit  may  then  change  the  sign,  take  the  absolute 
value,  or  take  the  negative  of  the  absolute  value  of  the  result.  The  FPILs,  in  addition 
to  performing  the  above  operations,  are  also  used  for  fixed-point  addition,  subtraction,  or 
multiplication,  or  to  perform  logical  comparisons  on  either  fixed  or  floating-point  values. 
For  both  the  FPs  and  the  FPILs,  the  startup  cost  associated  with  processing  the  first 
value  in  a  vector  is  one  clock  cycle.  There  is  no  startup  cost  associated  with  routing  values 
throug  a  MUX.  The  clock  for  the  ALU  operates  at  50  nsec,  giving  an  operation  rate  of  20 
Mflops  for  each  ALU  unit.  Hence,  by  configuring  a  pipeline  which  utilizes  all  of  the  FPs 
and  FPILs  in  the  ALU,  the  Nodal  peak  speed  of  640  Mflops  may  be  attained. 

Associated  with  each  of  the  FPs  and  FPILs  is  a  32-bit,  32-word  register  file,  as  illus¬ 
trated  in  figure  3.  Each  clock  cycle,  two  words  may  be  read  to  and  written  from  the  register 
file.  Words  are  loaded  into  the  register  files  from  FLONET  as  either  fixed  constants  or 
as  elements  of  a  vector  which  are  to  be  delayed.  To  illustrate  the  process  of  delaying  and 
offsetting  the  elements  of  a  vector,  called  “vector  latching,"  consider  a  computation  where 
u,_j  is  to  be  added  to  u,+1,  with  tt,_j,  u,+1,  u,+2  ...»  stored  in  sequential  memory  loca¬ 
tions  of  a  single  memory  plane.  In  this  process,  the  elements  of  the  vector  being  accessed 
from  FLONET  constitute  the  values  for  vector  u,+j.  These  values  are  routed  to  the  input 
port  of  the  FP/FPIL,  and  simultaneously  routed  to  the  register  file.  Two  clock  cycles  after 
an  element  is  accessed,  it  is  retrieved  from  the  register  file  and  routed  to  the  other  input 
port  of  the  FP/FPIL,  whereupon  it  is  added  to  the  element  presently  being  received  from 
FLONET,  which  is  u<+s.  Hence,  each  element  of  the  vector  is  used  twice  in  the  computa¬ 
tion,  first  as  u,+1  for  the  computation  at »,  and  then  as  u,_x  for  the  computation  at  *  +  2. 
In  this  manner,  the  set  of  u  values  may  be  stored  in  a  single  memory  plane,  rather  than 
having  copies  of  u  stored  in  multiple  memory  planes.  In  a  similar  manner  to  the  vector 
latching  process,  the  register  files  play  an  integral  part  in  performing  processes  such  as 
vector  recursion  and  summing  the  elements  of  a  vector. 

The  sixteen  ALU  functional  units  are  formed  by  hardwiring  combinations  of  the  FPs, 
FPILs,  and  MUXs  together.  Three  different  types  of  functional  units  are  formed  from 
these  processing  units,  as  illustrated  in  figure  4.  There  are  four  type  1  functional  units, 
which  consist  of  a  single  FPIL,  and  eight  type  2  functional  units,  which  consist  of  one  FP, 
one  MUX,  and  one  FPIL.  In  the  type  2  units,  the  MUX  is  used  as  a  switch  to  connect 
one  of  the  FPIL  input  ports  to  either  the  output  port  of  the  FP,  or  directly  to  an  ouput 
port  of  FLONET.  Thus,  type  2  functional  units  may  be  configured  to  behave  like  type  1 
units.  Type  3  functional  units,  of  which  there  are  four,  consist  of  two  FPs  and  an  FPIL, 
where  results  from  the  FPs  are  input  to  the  FPIL.  It  should  be  pointed  out  that  whereas 
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Figure  3.  Schematic  of  the  interconnects  between  the  register  files  and  the  FP/FPILs. 


Type  3  Type  2  Type  l 

4  units  8  units  4  units 


Figure  4.  ALU  Functional  Unit  types. 
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division  and  the  evaluation  of  transcendental  functions  cannot  be  performed  within  a 
single  functional  unit,  the  Nodal  architecture  does  provide  a  means  for  performing  these 
operations.  This  involves  the  utilization  of  FLONET  to  interconnect  two  type  2  functional 
units,  and  then  using  this  functional  unit  combination  to  perform  the  required  operation. 

The  medium  by  which  the  ALU  functional  units  are  interconnected  to  form  parallel 
processing  vector  pipelines,  and  by  which  operand  and  result  vectors  are  routed  to  and  from 
the  ALU,  is  FLONET.  FLONET  consists  of  a  series  of  16x48,  16x2,  and  8x16  MASNET 
type  switch  networks.  It  is  used  to  route  operand  vectors  from  the  Double  Buffered  Cache 
to  the  input  ports  of  the  appropriate  ALU  functional  units,  to  route  intermediate  results 
from  the  ALU  functional  units  back  to  the  input  ports  of  subsequent  ALU  functional  units 
in  the  pipeline,  and  to  route  final  results  from  the  functional  units  back  to  the  Double 
Buffered  Cache.  The  switch  states  of  FLONET  may  be  reset  by  the  microsequencer  every 
clock  cycle.  This  ability  to  reconfigure  FLONET  rapidly  provides  a  dynamically  changing 
vector  processing  environment. 

As  indicated  in  figure  2,  FLONET  is  also  used  to  route  vectors  to  and  from  the 
Shift/Delay  Unit.  The  Shift/Delay  Unit  consists  of  two  independent  units,  with  each  unit 
consisting  of  four  sequential  8  Kword  register  files.  These  units  perform  the  same  type  of 
task  as  performed  by  the  ALU  register  files  in  the  vector  latching  process.  However,  since 
the  Shift/Delay  register  files  contain  8  K words  of  memory,  vectors  may  be  delayed  for  up 
to  8000  clock  cycles  in  each  register  file.  Furthermore,  each  register  file  may  be  set  with 
a  different  delay,  providing  the  ability  to  generate  multiple  vectors  with  different  offsets 
from  a  single  source. 

As  alluded  to  previously,  the  Node  configuration  is  controlled  by  a  microsequencer. 
This  unit  not  only  controls  the  switch  states  of  MASNET,  the  MASNET  Cache  Router, 
and  FLONET,  but  also  sets  the  delays  in  the  Shift/Delay  Unit  and  specifies  the  operations 
to  be  performed  by  the  ALU  functional  units.  Typically,  this  unit  is  used  to  configure  the 
Node  at  the  start  of  an  array  operation,  and  the  configuration  remains  fixed  until  comple¬ 
tion  of  that  operation.  However,  results  from  logical  comparison  operations  may  be  used 
to  key  the  microsequencer  to  reconfigure  the  pipeline  conditionally  at  every  clock  cycle, 
without  (in  general)  requiring  a  pipeline  flush.  This  capability  permits  the  vectorization  of 
many  powerful  algorithms  which  are  not  vectorizable  on  conventional  vector  architecture 
supercomputers. 

The  operation  of  the  Node  is  controlled  and  monitored  by  a  Node  manager.  The  Node 
manager  is  primarily  used  as  an  intelligent  interface  between  the  Node  and  the  front-end. 
It  provides  initialization,  checkpointing,  and  data  store  handling  capabilities,  and  decodes 
macromachine  instructions  which  are  used  to  configure  the  Node.  In  addition,  the  Node 
manager  provides  scalar  operation  capabilities  at  a  rate  of  2  Mflops.  However,  it  should 
be  emphasized  that  the  Node  manager  rarely  becomes  involved  in  numerical  computations 
other  than  for  evaluating  expressions  for  use  as  constants  in  the  ALU. 

As  an  illustration  of  how  the  Node  is  configured  to  process  a  string  of  vector  elements, 
consider  a  Point  Jacobi  iteration  of  the  central-differenced  3-D  Poisson  equation, 

V2u  =  G  (1) 

on  a  uniform  grid  of  dimension  IxJxK.  The  update  equation  is 


(2) 
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where 

R*jk  —  J^(ui+ljk  +  ui-ljk  +  «<>+ 1*  +  U' i-lk  +  Uijk+l  +  1lijk-l  ~  6 Uy*)  ~  G»J,k  (3) 

The  first  step  in  performing  this  computation  is  to  choose  a  procedure  for  allocating 
memory  planes  to  store  the  variables  u  and  G.  Here  it  is  assumed  that  the  values  of  u  for 
a  given  vertical  plane  ( k  =  const.)  are  stored  within  a  single  memory  plane,  using  four  of 
the  memory  planes  to  store  all  of  the  values  of  u.  Values  of  u  for  planes  k(mod  4)  =  1  are 
then  stored  in  memory  plane  1  (MPl),  planes  k{mod  4)  =  2  in  MP2,  planes  k[mod  4)  =  3 
in  MP3,  and  planes  k(mod  4)  =  0  in  MP4.  Within  a  memory  plane,  values  of  u  for 
a  particular  vertical  plane  of  the  grid  are  stored  in  lexicographic  order  in  x  and  y.  For 
example,  consecutive  memory  locations  in  MPl  contain  the  values  of  u  for  grid  points 

(1.1.1)  (2,1,1)  (3,1,1)  ...  (7,1,1) 

(1.2.1)  (2,2,1)  (3,2,1)  ...  (7,2,1) 

(1,J,1)  (2,  J,  1)  (3,  J,  1)  ...  (7,7,1) 

Moving  up  the  grid  to  plane  k  =  5,  the  next  set  of  consecutive  memory  locations  in  MPl 


contain  the  values  of  u 

for  grid  points 

(1,1,5) 

(2,1,5) 

(3,1,5) 

...  (7,1,5) 

(1,2,5) 

(2,2,5) 

(3,2,5) 

...  (7,2,5) 

(1,7,5) 

(2,7,5) 

(3,7,5) 

...  (7,7,5) 

This  sequence  for  storing  the  values  of  u  is  repeated  for  planes  k  =  9,  13,  etc.,  up  through 
plane  k  =K— 3.  The  remaining  values  of  u  are  distributed  in  a  like  manner  over  MPs  2-4. 
Similarly,  the  values  of  G  are  stored  in  MPs  5-8. 

Upon  initialization  of  the  values  of  u  and  G,  the  process  for  performing  the  Point  Jacobi 
iteration  begins  by  streaming  operand  vectors  out  of  the  memory  planes  to  FLONET  and 
the  ALU.  Elements  of  these  vectors  are  accessed  using  a  constant  stride  of  1,  and  all 
values  of  u  within  a  given  memory  plane  are  updated  before  beginning  the  update  of  u  in 
subsequent  memory  planes.  This  procedure  gives  vector  lengths  of  IxJxK/4. 

Considering  the  update  of  u  for  values  stored  in  MPl,  then  at  an  internal  grid  point, 
say  point  (5,5, 5),ti,;*,tij+ijt,  and  reside  in  MPl,  and  values  of  uijfc+1, 

u ijk-u  and  G,-Jt  reside  in  MPs  2,  4,  and  5,  respectively.  For  the  computation  at  grid  point 
(6,5,5),  the  operand  values  reside  in  the  next  sequential  memory  location  of  the  same 
memory  planes,  indicating  that  once  the  pipeline  is  set,  a  given  term  is  always  accessed 
from  the  same  memory  plane.  In  order  to  generate  multiple  vectors  from  the  values  of  u 
stored  in  MPl,  the  Shift/Delay  Unit  is  utilized.  In  this  process,  as  the  values  of  u  enter 
the  Shift/Delay  Unit,  they  are  immediately  routed  out  of  the  1st  register  file  of  the  unit, 
constituting  the  values  for  vector  The  values  are  also  routed  simultaneously  to  the 

2nd  register  file  where  they  are  delayed  1-1  clock  cycles  and  then  routed  out  as  vector 
u,+jy*.  The  values  are  delayed  one  more  clock  cycle  in  the  3rd  register  file  and  routed  out 
as  vector  uyt,  and  then  delayed  I  clock  cycles  in  the  4th  register  file  and  routed  out  as 
vector  u,y_n  (note  that  I  must  be  <  8000  for  this  process).  Values  of  u.-jy*  are  generated 
from  vector  u,+ljt  in  the  register  file  of  an  FP  using  the  vector  latching  process  described 
previously. 

An  ALU  pipeline  for  performing  the  Point  Jacobi  iteration  is  illustrated  in  figure  5. 
Neither  inactive  register  files  nor  MUXs  are  indicated  in  this  figure.  At  the  first  level  of 


Figure  5.  Block  diagram  of  the  pipeline  for  performing  a  Point  Jacobi  iteration  of  the  3-D 
Poisson  equation.  The  operation  performed  by  each  ALU  Unit  is  indicate  by  +,  — ,  orx. 
R.F.  denotes  the  register  file  utilized  in  the  vector  latching  procedure.  Other  register  files 
are  only  indicated  for  those  ALU  units  which  access  constants. 
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computations,  six  operand  vectors  are  accepted  in  parallel  at  the  input  ports  of  two  of 
the  type  3  functional  units.  Note  that  in  the  addition  of  u,-+iyt  and  u,_y*,  the  vector 
latching  capability  of  the  register  file  is  utilized.  Results  from  the  two  operations  are 
routed  back  through  FLONET  and  directed  to  a  type  2  functional  unit,  where  they  are 
added  together  and  then  multiplied  by  a  constant.  This  intermediate  result  is  routed  back 
through  FLONET  and  directed  to  a  type  1  functional  unit  for  completion  of  the  residual 
calculation.  As  the  vector  elements  for  the  residual  are  routed  through  FLONET,  they  are 
then  split  into  two  equivalent  vectors. 

One  vector  for  the  residual  is  routed  to  a  type  2  functional  unit  where  an  in-line  local 
convergence  check  is  begun.  In  this  example,  the  convergence  criteria  is 

<  *  (4) 

In  the  local  convergence  check,  the  type  2  functional  unit  is  used  to  subtract  c  from  the 
absolute  value  of  the  residual.  The  three  type  1  functional  units  at  the  end  of  the  pipeline 
perform  logical  operations  on  this  result  to  determine  whether  or  not  convergence  has  been 
attained  at  a  point;  a  counter  is  incremented  for  each  point  at  which  convergence  has  not 
been  attained.  After  u  has  been  updated  at  all  points,  the  counter  is  polled  to  determine 
if  convergence  has  been  obtained.  Upon  satisfaction  of  this  condition,  a  convergence  flag 
interrupts  the  microsequencer,  and  the  pipeline  is  reconfigured  for  execution  of  the  next 
calculation  procedure  in  the  algorithm. 

The  second  vector  for  the  residual  is  routed  to  a  type  2  functional  unit  for  updating 
the  values  of  u,  and  the  results  from  this  operation  are  then  sent  to  memory.  Note  that 
in  order  to  prevent  overwriting  of  the  old  values  of  u  (which  are  required  in  subsequent 
computations),  the  values  of  um+1  cannot  be  stored  in  the  memory  locations  occupied 
by  um.  Since  the  Point  Jacobi  iteration  utilizes  13  ALU  processing  units  for  perfoming 
floating-point  operations,  the  nominal  operation  rate  for  this  process,  including  the  in-line 
local  convergence  check,  is  260  Mflops  per  Node. 

2.3  Internode  Communication 

Internode  addressing  on  the  NSC  is  supported  by  two  addressing  modes,  global  address¬ 
ing  and  explicit  boundary-point  definition  (BPD).  We  describe  here  only  BPD  addressing. 
BPD  involves  the  explicit  definition  of  all  boundary-point  data,  i.e.  data  associated  with 
points  on  the  boundary  of  the  computational  subdomain.  This  definition  includes  the 
source  Node  and  address  of  the  data,  and  all  destination  addresses.  For  local  discretiza¬ 
tion  schemes,  this  is  typically  the  only  data  which  needs  to  be  communicated  to  other 
Nodes.  As  an  example  of  BPD  addressing,  consider  the  Point  Jacobi  iteration  procedure 
discussed  previously.  For  an  internal  Node,  i.e.  one  which  does  not  contain  any  points  on 
the  boundaries  of  the  physical  domain,  all  values  of  u  at  grid  points  for  which  »  =  1  or  I, 
/  =  1  or  J,  or  k  =  1  or  K,  would  be  explicitly  defined  as  boundary-point  data. 

The  internode  communication  process  begins  in  the  source  Node  as  results  enter  the 
MASNET  Cache  Router  enroute  to  memory.  If  a  result  has  been  defined  as  a  boundary- 
point  value,  it  is  immediately  routed  from  the  MASNET  Cache  Router  to  the  Boundary 
Cache  of  the  source  Node,  while  simultaneously  being  routed  through  the  switch  element. 
t  The  Boundary  Cache  is  linked  to  specific  switch  elements  in  the  MASNET  Cache  Router 
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times  faster  than  the  clock  for  the  memory  planes  and  the  ALU.  Thus,  every  clock  cycle 
it  is  possible  to  extract  a  boundary  value  from  each  of  four  switch  elements.  Once  the 
data  is  received  in  the  Boundary  Cache,  it  is  immediately  sent  to  the  Hyperspace  Router 
of  the  source  Node,  and  then  directed  to  the  Hyperspace  Router  of  the  destination  Node. 
From  there  it  is  sent  to  the  Boundary  Cache  of  that  Node.  Then  when  boundary-point 
data  is  needed  in  an  ongoing  computation  of  the  destination  Node,  it  is  accessed  from  the 
Boundary  Cache  of  that  Node  and  inserted  into  the  MASNET  Cache  Router.  The  BPD 
data  is  then  sent  to  the  Double  Buffered  Cache  along  with  the  operands  being  accessed  from 
the  memory  planes  of  the  destination  Nodes.  A  schematic  of  the  hardware  interconnects 
between  the  Hyperspace  Router  and  MASNET  Cache  Router  is  presented  in  figure  6. 

The  local  Hyperspace  Routers  are  non-blocking  permutation  switch  networks  which  are 
used  to  route  boundary-point  data  to  the  appropriate  internode  communication  links.  The 
data  are  self-routing  in  that  the  destination  addresses,  which  are  carried  with  the  data,  are 
used  to  set  the  Hyperspace  Router  switch  states.  For  a  128  Node  NSC,  the  Hyperspace 
Routers  contain  8x8  switch  networks,  and  the  hypercube  internode  communication  network 
links  each  Node  to  seven  neighboring  Nodes.  Although  internode  communication  is  most 
easily  accomplished  for  data  transfers  in  which  the  destination  Node  is  directly  linked  to 
the  source  Node,  data  may  be  transferred  between  any  two  Nodes  by  routing  the  data  over 
a  series  of  Hyperspace  Routers. 

The  internode  communication  links  are  implemented  with  fiber-optic  cables,  providing 
data  transmission  in  byte-serial  format  at  a  duplex  rate  of  1  Gbyte/sec.  The  Boundary 
Cache  of  each  Node  consists  of  a  1  Mword  write-through  cache.  For  BPD  addressing,  the 
Boundary  Cache  is  continuously  updated  by  pre-communicating  the  boundary-point  data 
as  it  is  generated  in  the  source  Node.  Thus,  current  boundary  data  is  usually  maintained 
within  the  Boundary  Cache  of  each  Node,  eliminating  most  of  the  internode  communication 
overhead. 


? 


TO 

destination 

nodes 


From  nodes 


Figure  6.  Block  diagram  of  interconnects  between  the  MASNET  Cache  Router,  Boundary 
Cache,  and  Hyperspace  Router. 
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Internode  communication  delays  which  result  in  the  temporary  suspension  of  compu¬ 
tations  may  occur  in  a  number  of  situations.  One  such  situation  is  when  boundary-point 
values  are  not  present  in  the  Boundary  Cache  of  the  Node  at  the  time  they  are  required 
in  an  ongoing  computation.  The  computations  must  then  be  suspended  while  those  values 
are  retrieved  from  other  Nodes.  This  type  of  delay  is  likely  to  occur  in  problems  for  which 
the  amount  of  BPD  data  required  by  each  Node  is  greater  than  the  storage  capacity  of  the 
Boundary  Cache. 

Delays  may  also  occur  in  routing  the  BPD  data  out  of  the  hyperspace  routers.  This 
situation  arises  during  burst  transmissions  where  a  significant  portion  of  the  BPD  data 
is  transferred  between  Nodes  which  are  not  directly  linked  by  the  hypercube  network. 
Since  the  BPD  data  must  then  be  routed  over  a  series  of  Hyperspace  Routers,  the  switch 
networks  of  the  routers  may  become  overloaded.  If  the  overloading  is  severe  enough,  the 
BPD  data  will  not  be  present  in  the  destination  Node  at  the  time  it  is  required  in  an  ongoing 
computation,  causing  a  temporary  suspension  in  the  computations.  This  type  of  internode 
communication  delay  is  most  likely  to  occur  in  algorithms  for  which  the  computations 
are  not  localized  (e.g.  spectral  methods).  For  algorithms  in  which  the  computations  are 
localized  (e.g.  finite-difference  methods),  the  amount  of  data  which  must  be  transferred 
between  Nodes  is  small  enough  that  delays  in  routing  the  data  out  of  the  Hyperspace 
Routers  are  unlikely  to  cause  a  suspension  in  the  computations,  particularly  since  the  data 
is  pre-communicated. 

3.  DIRECT  SOLUTION  OF  THE  NAVIER-STOKES  EQUATIONS 

As  an  initial  step  towards  determining  how  well  transition  and  turbulence  algorithms 
perform  on  the  NSC,  the  simplest  such  algorithm  for  the  direct  simulation  of  isotropic 
turbulence  is  considered  here.  The  governing  equations  for  this  problem  are  the  incom¬ 
pressible,  time-dependent  Navier-Stokes  equations  with  constant  viscosity.  Written  in 
rotation  form,  the  non-dimensionalized  equations  are 

u(  =  u  x  fl  -  VP  +  4~V2u 


V  o  u  =  0 


(5) 


where 

n  =  V  x  u 

P  =  P+  jM2, 

u  denotes  the  velocity,  p  the  pressure,  and  Re  the  Reynolds  number.  Boundary  conditions 
for  the  isotropic  turbulence  problem  are  periodic  in  all  three  directions. 

The  solution  algorithm  is  based  on  a  time-splitting  scheme  in  which  the  solution  is 
advanced  from  t  =  tn  to  t  =  tn+1  as  follows:  in  the  first  step, 


u,  =  u  x  n  + 


(6) 


is  integrated  from  tn  to  the  intermediate  time  t*;  in  the  second  step, 


Nosenchuck,  Krist,  and  Zang 

503 

U(  =  -VP 

V  o  u  =  0 

(7) 

is  integrated  from  t*  to  tn+1. 

The  temporal  discretization  of  the  resulting  system  of  equations  couples  a  third-order 
low  storage  Runge-Kutta  treatment  of  the  advection  term  with  a  Crank-Nicolson  treatment 
of  the  diffusion  term  in  the  velocity  step,  and  a  backward  Euler  pressure  correction  applied 
after  each  Runge-Kutta  stage.  The  steps  of  the  solution  procedure  for  the  three  Runge- 
Kutta  stages  may  be  written  in  low  storage  form  as 

Ft  =  At(ui  x  (l,)  -  caP,_ 1 

(8) 

Li  =  Uj  +  baFi  -  bhV2ui 

(9) 

u;  +  bhv2  u;  =  u 

(10) 

Ki  =  Vo\i't 

(11) 

v2p;  =  Ki 

(12) 

Ui+i  =  uj  -  vp; 

(13) 

where  l  =  1,2,3  denotes  the  Runge-Kutta  stage,  Uj  is  associated  with  un,  and  u4  is 
associated  with  u"+1.  The  coefficients  ca,ba,  and  6*  for  the  three  Runge-Kutta  stages 
are  listed  in  table  1.  This  particular  temporal  discretization  has  been  used  extensively  in 
transition  simulations  [7],  and,  except  for  very  low  Reynolds  number  flows,  yields  a  method 
which  is  effectively  third-order  accurate  and  is  asymptotically  stable  in  time  (provided  that 
the  time-step  is  smaller  than  the  advection  Courant  limit). 

Since  the  NSC  is  tailored  towards  algorithms  in  which  the  computations  are  localized, 
finite-differencing  is  employed  in  the  spatial  discretization.  For  simplicity,  second-order 
central  differencing  is  employed  in  the  discretization  of  eqs.(8-10).  The  discretized  Poisson 
operator  written  in  eq.(l2)  must  be  the  composition  of  the  discrete  divergence  and  gradient 
operators  in  order  for  eq.(7)  to  be  satisfied  exactly  by  the  discrete  solution.  In  order  to 
maintain  a  consistent  Poisson  equation,  the  gradient  in  eq.(13)  is  treated  with  backward 
differencing,  while  the  divergence  operator  of  eq.(ll)  is  treated  with  forward  differencing. 
This  discretization  generates  the  standard  representation  for  the  second-order,  central- 
differenced  Poisson  equation.  The  computational  grid  for  this  problem  is  Cartesian  with 
a  non-staggered  uniform  grid  in  all  three  spatial  directions.  From  Fourier  analysis  of 
this  problem  on  a  3-D  grid,  it  can  be  shown  that  modes  kx  =  kv  =  k,  =  0  and  N/2 


Table  1:  Coefficients  for  3rd-order  Runge-Kutta 


Runge-Kutta  stage 

Co 

ba 

bn 

1 

0 

1/3 

—  At/(6Rc) 

2 

5/9 

15/16 

—5At/(24Re) 

3 

153/128 

8/15 

—At/(8Re) 
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comprise  the  kernel  of  the  discrete  Poisson  operator,  where  kx,  kv,  and  k,  denote  the 
Fourier  wavenumbers,  and  N  grid  points  are  used  in  each  direction.  These  modes  must  be 
filtered  from  the  right  hand  side  of  eq.(12). 

The  most  time-consuming  portions  of  the  computational  work  for  this  algorithm  are  the 
solutions  of  the  Helmholtz  equations  for  the  velocity  components  (10),  and  solution  of  the 
Poisson  equation  for  pressure  (12).  In  this  report,  two  relaxation  schemes  are  considered: 
Point  Jacobi  and  Red-Black  SOR.  The  Point  Jacobi  method  is  described  in  the  previous 
section.  Red-Black  SOR  is  an  explicit  two-color  point  method  in  which  an  over-relaxed 
Gauss-Seidel  type  iterative  scheme  is  utilized.  A  complete  iteration  involves  updating  red 
points  (t  +  j  +  k  odd)  first,  and  then  updating  black  points  (t  +  j  +  k  even)  using  the  latest 
available  values  in  the  residual  calculation. 

Of  course,  these  relaxation  schemes  must  be  coupled  with  multigrid  acceleration  to  be 
at  all  practical,  especially  on  a  machine  such  as  a  64  Node  NSC,  which  can  accommodate 
grids  of  over  1000s.  A  detailed  discussion  of  multigrid  procedures  based  on  under-relaxed 
Point  Jacobi  for  2-D  Poisson  equations  has  been  provided  by  Stuben  and  Trottenberg  [8], 
The  extension  to  3-D  is  straightforward.  Even  with  simple  injection  for  the  restriction 
operator  and  trilinear  interpolation  for  prolongation,  smoothing  rates  on  the  order  of  0.6 
are  achievable  with  non-stationary  relaxation.  A  3-D  multigrid  algorithm  based  on  Red- 
Black  Gauss-Seidel  relaxation  has  been  reported  on  by  Holter  (9). 

A  related  application  of  the  NSC,  which  has  been  discussed  elsewhere  [10,11],  is  the 
simulation  of  wall-bounded  flows  which  are  periodic  in  the  directions  parallel  to  the  wall. 
In  this  case,  a  non-uniform  grid  will  be  required  in  the  direction  normal  to  the  wall.  Point 
relaxation  schemes  are  very  inefficient  for  such  a  problem,  even  with  multigrid  acceleration. 
If  the  problem  were  two-dimensional,  then  alternating  line  relaxation  would  suffice  [12]. 
However,  this  is  not  sufficient  for  a  3-D  problem.  A  promising  approach  is  the  method  pro¬ 
posed  by  Cline,  Schaffer  and  Slaughter  [13],  where  line  relaxation  (in  the  normal  direction) 
is  coupled  with  multigrid.  The  grid  coarsening  is  performed  only  in  the  two  directions 
parallel  to  the  wall  (in  which  a  uniform  grid  is  used).  Smoothing  rates  on  the  order  of  0.5 
appear  achievable. 

In  order  to  verify  the  suitability  of  this  algorithm  for  simulating  transition  and  tur¬ 
bulence  problems,  calculations  of  the  Taylor-Green  vortex  problem  were  performed  on  a 
conventional  supercomputer,  the  Cray  2.  This  is  an  isotropic  turbulence  problem  which 
has  been  described  and  studied  extensively  by  Brachet  et.al.  [14].  Two  related  codes  were 
employed.  The  first  uses  single  grid  Point  Jacobi  for  solution  of  the  Helmholtz  and  Poisson 
equations.  The  second  code  also  uses  single  grid  Point  Jacobi  for  solution  of  the  Helmholtz 
equations.  However,  for  the  Poisson  equation,  which  has  a  much  worse  conditon  number, 
a  multigrid  method  was  incorporated  into  the  Point  Jacobi  relaxation  scheme. 

Production  runs  for  the  simulation  of  the  Taylor-Green  vortex  problem  were  performed 
with  the  multigrid  code  on  a  64s  grid,  for  Reynolds  numbers  of  100,  200,  and  400  [ll[. 
The  results  up  to  t  =  2.5  compare  well  with  the  results  in  [14].  However,  the  resolution  of 
the  grid  quickly  becomes  inadequate  after  this  point,  particularly  for  the  Reynolds  number 
200  and  400  cases. 

Timing  results  for  performing  this  simulation  on  the  Cray  2  are  summarized  in  table  2. 
These  are  based  on  a  convergence  criteria  of  reducing  the  L2-norm  of  the  residual  by  a 
factor  of  10-T.  The  results  present  the  time  required  to  advance  the  solution  one  complete 
time  step,  the  smoothing  rate  for  the  iterative  scheme,  and  the  fraction  of  time  which  is 
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Table  2:  Cray  2  timing  results  for  a  single  time  step  on  32s,  64s,  and  128s  grids. 
Time  is  given  in  seconds,  P.E.  frac.  denotes  the  fraction  of  time  spent  on  solution 
of  the  Poisson  Equation,  Jl  the  smoothing  rate,  PJ  the  single  grid  Point  Jacobi 
method,  PJMG  the  Point  Jacobi  Multigrid  method.  The  32s  and  64s  results  are  for 
the  time  step  at  t  =  2.5,  the  128s  results  are  for  the  step  at  t  =  0. 


Iterative 

Method 

32s 

64s 

128s 

Time 

P.E.  frac. 

Time 

P.E.  frac. 

Time 

P.E.  frac. 

M 

PJ 

33.2 

.963 

.976 

525 

.986 

.992 

19784 

.997 

.999 

PJMG 

4.45 

.745 

.583 

34.1 

.754 

.606 

191 

.673 

.453 

spent  on  solution  of  the  Poisson  equation  at  t  —  2.5,  which  is  about  1/3  of  the  way  through 
the  simulation.  For  the  128s  grid,  the  calculated  smoothing  rates  for  the  methods  are  a  bit 
misleading  in  that  at  this  point  in  the  Taylor-Green  vortex  simulation,  the  energy  spectra 
of  the  higher  wavenumber  modes  has  yet  to  grow  significantly.  Thus,  one  would  expect 
some  degradation  in  convergence  rates  for  the  general  isotropic  turbulence  simulation, 
and  a  corresponding  increase  in  the  computational  time  per  step.  As  indicated  by  the 
results,  the  greatest  portion  of  the  computational  work  goes  towards  solution  of  the  Poisson 
equation.  Clearly,  the  single  grid  Point  Jacobi  method  is  inadequate  for  solving  the  Poisson 
equation  in  a  reasonable  amount  of  time.  Results  for  the  multigrid  method  are  much  more 
reasonable,  but  the  Poisson  solution  still  accounts  for  about  75%  of  the  total  computational 
time. 


4.  IMPLEMENTATION  OF  THE  ALGORITHM  ON  THE  NSC 

The  first  consideration  is  the  distribution  of  the  computational  domain  over  the  Nodes. 
For  now  it  is  assumed  that  an  LxMxN  computational  grid  is  subdivided  into  64  subdomains 
of  dimension  IxJxK,  where  I  =  L/4,  J  =  M/4,  and  K  =  N/4.  In  order  to  simplify  the 
computational  procedures  for  implementation  of  the  multigrid  algorithm,  it  is  further 
assumed  that  I,  J,  and  K  are  powers  of  2.  Conceptually,  the  computational  domain  is 
mapped  into  a  three-dimensional  lattice  of  Nodes  (for  the  multigrid  algorithm,  the  coarsest 
grid  allowable  with  this  mapping  utilizes  one  grid  point  per  Node).  For  the  finite  difference 
algorithm,  each  interior  Node  need  communicate  only  with  its  adjacent  Nodes.  Boundary 
Nodes,  i.e.  Nodes  into  which  grid  points  from  the  boundary  of  the  computational  domain 
have  been  mapped,  must  also  communicate  with  non-adjacent  Nodes  in  situations  where 
periodic  boundary  conditions  are  to  be  enforced.  The  hypercube  network  provides  direct 
links  between  all  adjacent  Nodes  in  a  three-dimensional  lattice,  and  between  appropriate 
boundary  Nodes  in  which  periodic  boundary  conditions  are  to  be  enforced. 

On  the  nodal  level,  a  procedure  for  allocating  memory  planes  to  store  the  variables 
must  be  chosen.  This  allocation  procedure  is  crucial  to  efficient  implementation  of  the 
algorithm  as  it  not  only  affects  the  ordering  of  values  in  the  operand  and  result  vectors, 
but  also  influences  the  actual  configuration  of  the  ALU  pipelines.  The  variable  u  is  stored 
in  MPs  1-4,  as  described  in  the  example  of  section  2,  while  v,w ,  and  P  are  stored  in  MPs 
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5-8,  9-12,  and  13-16,  respectively,  using  similar  allocation  procedures.  Consequently,  at 
every  grid  point  for  which  u  is  stored  in  MPl,  the  values  of  v,  w,  and  P  at  that  grid  point 
are  stored  in  MPs  5,  9,  and  13,  respectively.  Similar  relationships  between  the  variables, 
and  the  memory  planes  in  which  the  variables  are  stored,  hold  at  all  other  grid  points. 

The  first  step  of  the  solution  algorithm  is  to  compute  the  advection  term  components 
for  the  first  Runge-Kutta  stage.  This  is  followed  by  the  computation  of  the  right  hand 
side  of  the  Helmholtz  equation  for  u,  and  then  the  solution  of  the  Helmholtz  equation.  A 
similar  procedure  is  followed  for  v  and  w.  Then  the  right  hand  side  of  the  Poisson  equation 
is  calculated,  and  it  is  solved  for  P.  Finally,  the  velocity  is  corrected  from  the  updated 
values  of  the  pressure.  The  calculation  procedures  are  then  repeated  for  the  2nd  and  3rd 
Runge-Kutta  stages. 

4.1  Calculation  of  the  Explicit  Terms 

One  of  the  more  complex  procedures  in  this  algorithm  is  the  computation  of  the  advec¬ 
tion  term  components.  The  following  discussion  illustrates  some  of  the  intricacies  involved 
in  configuring  the  Node  for  a  given  process. 

Using  second-order  central  differencing  in  the  spatial  discretization  of  eq.(lO),  and  de¬ 
noting  the  (*,  y,  z)  components  of  the  advection  term  and  the  vorticity  vector  as  (F,  G,  H) 
and(£,  tj,  f),  respectively,  the  components  of  the  advection  term  may  be  written  as 


Fijk  =  At(v, jk$iik  -  tVij+rHik)  -  caFijk  (14) 

Gijk  =  &t(Wiikti}k  -  uijk$ijk)  ~  CaGijk  (15) 

Hijk  =  A t{v.ijkr)ijk  -  Vij ktijk)  -  caHijk  (16) 

where 

Zijk  =  («\,+u  ~  w„-u  -  vijk+i  +  v,;*_i)/(2Ax)  (17) 

Vijk  =  (u.',ib+i  -  Uijk- 1  -  Wi+ijk  +  Wi-ijk)/( 2 Ax)  (18) 

Sijk  =  (vi+i/k  -  Vi-ljk  -  xiij+ik  +  u0_1*)/(2Ax)  (19) 


Note  that  the  terms  for  F,  G,  and  H  on  the  right  hand  side  of  eqs.(14-16)  are  generated 
in  the  previous  Runge-Kutta  Stage,  and  the  newly  calculated  values  of  these  terms  are  to 
be  written  over  the  old  values  in  memory. 

Calculation  of  the  advection  term  components  is  performed  using  a  two-step  procedure. 
In  the  first  step,  r),$,  and  F  are  computed  at  one-fourth  of  the  grid  points.  Beginning  the 
computation  at  grid  point  (1,1,1),  then  consecutive  values  in  the  result  vectors  will  be  for 
grid  points  (1,1,1),  (2,1,1),  (3,1,1),  ...(i.e.,  the  same  series  of  grid  points  for  which  tt  is 
stored  in  MPl).  From  eqs.(14),  (18),  and  (19),  calculation  of  r;,f,  and  F  at  grid  point 
{i,j,k)  requires  the  values  of  u.y+it,  u.yi+i, «./*-!,  v,,*,  u^+iyt,  v.-iyt, w<+ljk,  and 
Wi-ijk.  Looking  at  an  interior  grid  point  associated  with  the  result  vectors,  say  point 
(5,5,5),  the  above  operands  reside  in  MPs  1,  1,  2,  4,  5,  5,  5,  9,  9,  and  9,  respectively. 
Note  that  the  vectors  for  Vj+i)k,  v,-,*,  and  Ui-y*,  and  vectors  for  w<+1}k,  wijk  and  u\_  1}k  are 
generated  from  MPs  5  and  9,  respectively,  using  FLONET  and  the  vector  latching  process 
described  previously.  The  values  of  uij+lk  and  Uy_jt  are  generated  from  MPl  using  the 
Shift/Delay  Unit.  An  illustration  of  the  14  operation  ALU  pipeline  for  calculating  r/,f, 
and  F  is  presented  in  figure  8.  Here,  the  values  of  T)  and  f  are  calculated  in  independent 
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Figure  7.  Block  diagram  of  ALU  pipeline  for  the  first  step  in  the  advection  term  calculation 
procedure. 


Figure  8.  Block  diagram  of  the  ALU  pipeline  for  the  second  step  in  the  advection  term 
calculation  procedure. 
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pipelines,  and  results  for  these  terms  are  stored  in  MP12  and  MP13,  respectively.  These 
results  are  simultaneously  routed  through  FLONET  to  the  remainder  of  the  pipeline  for 
calculating  F.  Results  for  F  are  stored  in  MP  10. 

The  second  step  of  the  advection  term  procedure  is  to  calculate  £,  G,  and  H  at  the  same 
grid  points.  From  eqs.(15-17),  calculation  of  these  terms  at  grid  point  ( i,j,k )  requires  the 
values  of  \iijk,viik,  viik+u  u,yt-i,  wiik,Wij+ik,  and  fty*.  The  values  of  the  first 

seven  operands  reside  in  MPs  1,  5,  6,  8,  9,  9,  and  9,  respectively.  From  the  previous  step, 
the  values  of  i)ijk  and  f,y*  reside  in  MPs  12  and  13,  respectively.  The  16  operation  ALU 
pipeline  for  performing  the  second  step  computations  is  illustrated  in  figure  8.  Results  for 
G  and  H  are  stored  in  MPs  3  and  7,  respectively. 

Upon  completion  of  the  above  steps,  the  three  components  of  the  advection  term  will 
have  been  computed  at  one-fourth  of  the  grid  points.  The  two  step  procedure  is  then 
repeated  3  times,  computing  F,  G,  and  H  at  grid  points  for  which  values  of  u  are  stored 
in  MPs  2,  3,  and  4. 

The  calculation  of  £,  rj,  and  f  requires  boundary-point  values  for  u,  u,  and  w  from 
neighboring  Nodes.  However,  the  step  previous  to  calculation  of  the  advection  terms  is 
the  velocity  correction  procedure,  for  which  the  Boundary  Cache  of  each  Node  will  contain 
boundary  point  values  of  P.  Thus,  prior  to  beginning  the  advection  term  computations, 
the  Boundary  Caches  must  be  reloaded  with  the  BPD  data  for  u,  v,  and  w.  For  the 
variable  u,  the  computation  requires  boundary-point  values  at  all  (»,  l,k),  ( i,j ,  1), 

and  (»,j,K)  grid  points,  for  a  total  of  2(IxJ+IxK)  values.  Similarly,  2(JxI+JxK)  and 
2(KxI+KxJ)  boundary-point  values  of  v  and  w  are  required,  respectively.  For  a  computa¬ 
tional  subdomain  of  dimension  256s,  which  is  considered  in  an  example  at  the  end  of  this 
section,  around  8x10s  boundary-point  values  are  required.  Since  the  Boundary  Cache  has 
a  1  Mword  storage  capacity,  all  of  the  BPD  data  required  by  each  Node  may  be  stored 
within  its  Boundary  Cache.  Assuming  that  all  of  the  boundary-point  values  are  to  be 
reloaded  in  the  Boundary  Cache  before  the  computations  are  begun,  the  delay  for  this 
procedure  is  IxJ+JxK+KxI  clock  periods.  There  are  no  other  internode  communication 
delays  for  this  procedure. 

In  order  to  project  the  total  time  required  to  compute  the  advection  terms,  the  startup 
time,  delay  time  associated  with  the  intemode  communication  procedure,  and  delays  re¬ 
sulting  from  Nodal  procedures  such  as  flushing  the  Double-Buffered  Cache,  must  be  deter¬ 
mined.  The  total  time  to  compute  the  advection  terms  is  projected  to  be 
184+81+2  IxJxK+IxJ+JxK+KxI  clock  periods.  The  operation  count  for  the  procedure 
is  30  IxJxK.  Neglecting  the  startup  and  delay  costs,  the  nominal  operation  rate  of  the 
procedure  is  300  Mflops. 

The  other  explicit  terms  to  be  calculated  in  the  algorithm  are  the  right  hand  sides  of  the 
three  Helmholtz  equations,  the  right  hand  side  of  the  Poisson  equation,  and  the  velocity 
correction  terms.  Relative  to  the  calculation  procedure  for  the  advection  terms,  these 
computations  are  all  quite  straightforward,  with  the  major  programming  concern  being 
to  insure  that  there  are  no  memory  conflicts  in  accessing  the  operand  vectors.  In  fact, 
due  to  the  relative  simplicity  of  the  calculations,  the  procedures  for  each  of  the  remaining 
explicit  terms  may  be  implemented  for  two  sets  of  vectors  simultaineously.  As  an  example, 
consider  the  calculation  procedure  for  the  right  hand  side  of  the  Poisson  equation.  Upon 
forward  differencing  eq.(ll), 

Kijk  =  (u, •+!,•*  -  tt.-yt  +  V, ■,•+!*  -  V.y*  +  U\yt+i  -  U\y*)/Ax. 


(20) 
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Considering  the  calculation  of  K  at  grid  points  for  which  u  is  stored  in  MPl,  the  operands 
are  accessed  from  MPs  1,  1,  5,  5,  10,  and  9,  respectively.  To  calculate  K  at  grid  points 
for  which  u  is  stored  in  MP3,  the  operands  are  accessed  from  MPs  3,  3,  7,  7,  12,  and 
11,  respectively.  As  in  previous  examples,  values  of  u,+i>t  and  tty*  are  generated  from  a 
single  source  using  the  register  hies  of  the  ALU  units,  and  values  of  ti,J+i*  and  Vy*  are 
generated  from  a  single  source  using  the  Shift/Delay  Unit.  Hence,  it  is  possible  to  set  up 
two  independent  pipelines  for  calculating  K  for  two  distinct  vectors,  without  generating 
memory  plane  conflicts.  The  resulting  procedure  gives  a  nominal  operation  rate  of  240 
Mflops.  Calculation  procedures  for  the  right  hand  side  of  the  three  Helmholtz  equations 
and  for  the  velocity  correction  procedure  may  be  set  up  in  a  similar  manner,  giving  nominal 
operation  rates  for  these  procedures  of  400  Mflops  and  360  Mflops,  respectively. 

In  advancing  the  solution  from  t  —  tn  to  t  =  tn+1,  the  total  time  spent  on  calculation 
of  the  explicit  terms  is  projected  to  be  3564+361+13.5  IxJxK+9(IxJ+JxK+KxI)  clock 
periods.  The  operation  count  for  these  computations  is  270  IxJxK.  Neglecting  the  startup 
and  delay  costs,  the  sustained  operation  rate  is  around  325  Mflops. 


4.2  Single  Grid  Solutions  of  the  Helmholtz  and  Poisson  Eqs. 

Single  grid  solution  of  the  Helmholtz  and  Poisson  equations  is  analyzed  for  the  Point 
Jacobi  and  Red-Black  SOR  iterative  schemes.  Beginning  with  Point  Jacobi,  consider  the 
solution  of  the  Helmholtz  equation  for  u.  Upon  central  differencing  eq.(12),  the  residual 
equation  for  a  Point  Jacobi  iteration  may  be  written  as 

R7jk  ~  Ujk  ~  Cut*”*  +  + 1 C-\jk  +  u,7+u  +  u7j-ik  +  “,7*+i  +  u7}k- i)  (21) 


where 


and  m  denotes  the  iteration  level.  The  update  equation  for  Point  Jacobi  then  becomes 


m+l  _  m 


ijk 


=  «7*  + 


cu 


(22) 


Once  again,  the  calculation  procedure  is  begun  at  a  grid  point  for  which  the  values  of  u 
are  stored  in  MPl.  Values  of  Ly*,  which  are  computed  in  the  previous  step,  are  accessed 
from  MP12.  Operand  vectors  for  «,;*,  u,+i,*,  u,_l3*,u,J+u,  and  Uy_i*  are  generated  from 
MPl  using  the  Shift/Delay  Unit  and  the  vector  latching  process.  Operands  for  Uy*+1  and 
«y*_i  are  accessed  from  MPs  2  and  4  respectively.  The  ALU  pipeline  for  performing  the 
Point  Jacobi  iteration  is  illustrated  in  figure  9.  Note  that  by  configuring  the  pipeline  in 
this  manner,  10  ALU  units  are  used  in  calculating  the  residual,  whereas  eq.(21)  indicates 
that  only  9  operations  are  required.  The  reason  for  configuring  the  pipeline  with  an  extra 
operation  will  become  apparent  presently.  As  in  the  example  of  section  2,  once  the  residual 
has  been  calculated,  the  values  are  routed  to  two  distinct  pipelines:  one  for  updating  the 
values  of  u,  and  the  other  for  performing  a  check  on  convergence.  In  order  to  avoid 
overwriting  the  values  of  um,  which  are  required  in  subsequent  computations,  the  updated 
values  um+1  are  stored  in  MP5,  rather  than  in  their  initial  memory  locations. 
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Upon  updating  the  values  of  ti  from  MP1,  values  of  u  from  MPs  2,  3,  and  4  are  updated, 
with  the  updated  values  stored  in  MPs  6,  7,  and  8,  respectively.  Then  in  the  next  iteration, 
operand  values  for  the  Point  Jacobi  update  are  accessed  from  MPs  5,  6,  7,  and  8,  and  the 
updated  values  are  stored  back  in  their  initial  memory  locations. 

Due  to  the  simplicity  of  the  Point  Jacobi  iteration  procedure,  values  of  u  can  actually  be 
updated  for  two  sets  of  data  simultaneously.  In  this  procedure,  two  independent  pipelines 
for  the  Point  Jacobi  iteration  are  configured,  using  2  type  1,  7  type  2,  and  4  type  3  ALU 
functional  units.  The  extra  operation  in  calculation  of  the  residual  values  is  included 
because  that  is  the  easiest  way  to  configure  both  pipelines,  without  requiring  more  ALU 
functional  units  than  available.  In  the  computational  procedure,  values  of  u  for  MPs 
1  and  2,  and  then  for  MPs  3  and  4,  are  updated  simultaneously.  This  procedure  gets 
a  bit  complicated  in  that  both  computations  require  operands  from  the  same  memory 
planes.  For  instance,  while  updating  u.yj  from  MP1,  the  update  of  uij6  from  MP2  would 
be  proceeding  simultaneously,  and  these  values  of  u  are  required  in  both  computations. 
However,  the  multiple  operand  vectors  which  must  be  generated  for  these  values  need  not 
be  offset,  so  the  copies  can  be  generated  while  the  operands  are  routed  through  FLONET. 
By  performing  the  procedures  simultaneously,  the  nominal  operation  rate  for  the  Point 
Jacobi  procedure,  including  the  in-line  convergence  check,  is  increased  from  an  effective 
rate  of  260  Mfiops,  up  to  540  Mflops. 

In  the  convergence  check  of  this  procedure,  a  global  check  on  the  L2-norm  of  the 
residual  is  implemented,  rather  than  the  in-line  local  convergence  check  of  the  example 


Figure  9.  Block  diagram  of  the  ALU  pipeline  for  the  Point  Jacobi  solution  of  the  Helmholtz 
equation  for  u. 
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in  section  2.  In  this  procedure,  the  residual  values  are  squared,  and  then  added  to  the 
squared  values  of  the  residual  from  the  simultaneous  calculation  proceeding  in  the  other 
pipeline  (as  indicated  at  the  bottom  right  of  figure  9).  This  value  is  then  accumulated  with 
the  values  from  previous  calculations.  Upon  completion  of  the  iteration,  the  accumulated 
values  from  each  Node  are  transferred  to  a  single  Node,  where  the  values  from  all  Nodes 
are  accumulated  and  the  L 2-norm  is  determined.  A  decision  as  to  whether  or  not  the 
solution  has  converged  globally  is  then  made  by  the  microsequencer  of  that  Node. 

Relaxation  of  the  Helmholtz  equation  requires  2(IxJ+JxK+KxI)  boundary-point  values 
of  u  per  Node.  For  computational  subdomains  of  the  size  considered  in  this  analysis,  all 
of  the  BPD  data  may  be  stored  within  the  Boundary  Cache.  As  an  iteration  proceeds 
and  values  of  u  are  updated,  the  updated  boundary-point  values  are  immediately  routed 
to  their  destination  Nodes,  replacing  the  old  values.  By  performing  the  same  sequence  of 
computations  in  each  of  the  Nodes,  the  time  between  updating  a  boundary-point  value 
and  utilizing  that  value  in  the  destination  Node  is  at  least  IxJ  clock  periods.  Thus,  as  long 
as  subdomains  of  a  substantial  IxJ  dimension  are  mapped  into  each  Node,  the  BPD  data 
is  pre-communicated  long  before  it  is  required  in  an  ongoing  computation  of  a  destination 
Node,  but  is  never  communicated  before  the  old  values  have  been  utilized.  This  not 
only  ensures  proper  operation  of  the  iterative  method,  but  also  ensures  that  there  are  no 
internode  communication  delays  for  the  computational  procedure. 

The  computational  procedures  for  Point  Jacobi  solution  of  the  Helmholtz  equations 
for  v  and  w,  and  for  the  Poisson  equation,  are  performed  in  the  same  manner  as  above. 
The  total  computation  time  for  performing  a  Point  Jacobi  iteration  is  projected  to  be 
58+21+1/2  IxJxK  clock  periods.  The  operation  count  for  the  procedure  is  13.5  IxJxK. 
The  nominal  operation  rate  is  540  Mflops. 

As  mentioned  previously,  the  Poisson  equation  solution  by  Point  Jacobi  iteration  re¬ 
quires  the  prefiltering  of  both  P  and  the  right  hand  side.  This  procedure  requires  a  global 
computation  to  determine  the  amplitude  of  the  filtered  Fourier  modes.  However,  the  fil¬ 
tering  technique  is  well  suited  to  the  Nodal  architecture  of  the  NSC,  as  filtering  of  both 
modes,  for  both  P  and  the  right  hand  side,  may  be  performed  simultaneously.  The  time 
to  perform  the  filtering  is  projected  to  be  681+1/2  IxJxK  clock  periods.  The  operation 
count  is  10  IxJxK,  and  the  nominal  operation  rate  is  400  Mflops. 

The  second  single  grid  iterative  method  considered  here  is  Red-Black  SOR.  This  is 
a  two-color  point  method  in  which  an  over-relaxed  Gauss-Seidel  type  iterative  scheme  is 
utilized.  A  complete  iteration  entails  updating  red  points  (i  +  j  +  k  odd)  first,  and  then 
updating  black  points  (i  +  j  +  k  even)  using  the  latest  available  values  in  calculating  the 
residual.  The  residual  equation  for  Red-Black  SOR  may  be  written  as 


•^,'j'Jk  —  C'uUijk  +  Ck(u,.+  yk  +  +  uij+ik  +  Uij-lk  +  Uijk+1  +  (^3) 

where  Cu  and  Ck  are  defined  in  eq.(23),  v  =  m  for  red  points,  and  v  =  m  +  1  for  black 
points.  Upon  over-relaxation,  the  update  equation  becomes 


,m+i  =  um 


ijk 


+ 


pm 

Kijk 

wC„ 


(24) 


where  w  denotes  the  relaxation  parameter.  The  optimal  w  for  this  problem  may  be  de¬ 
termined  from  the  tilted  grid  Fourier  analysis  developed  by  LeVeque  and  Trefethen  [15]. 
Although  the  ALU  pipelines  for  the  Point  Jacobi  and  Red-Black  SOR  methods  are  nearly 
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identical,  the  computation  procedures  are  significantly  different.  The  reason  for  this  is 
that  the  memory  plane  addressing  stride  for  Red-Black  SOR  requires  that  values  of  u!-+ljkt 
uij+u*  uij-ik>  uijk+i>  an<*  uijk~ i  be  stored  in  a  different  memory  plane  from  the 
value  of  u%k. 

At  the  beginning  of  the  Red-Black  SOR  procedure,  copies  of  u  from  MPs  1,  2,  3,  and  4 
are  stored  in  MPs  5,  6,  7,  and  8,  respectively.  In  updating  the  values  of  u  for  MPl,  then, 
Uijk  is  accessed  from  MPl,  u,+1y*,  u,_ljt  ,u<y+jt,  and  u,,_u  are  accessed  from  MP5,  and 
Uijk+i  and  Uijk-i  are  accessed  form  MPs  6  and  8,  respectively.  Since  only  red  points  are 
updated  in  the  first  step  of  the  procedure,  the  values  of  the  operand  vectors  are  accessed 
with  a  constant  stride  of  2.  The  operands  are  then  routed  to  the  ALU  in  a  similar  manner 
to  the  Point  Jacobi  procedure.  Updated  values  of  u  are  then  routed  back  to  the  Double 
Buffered  cache,  and  then  routed  to  both  MPs  1  and  5,  replacing  the  old  values.  There 
is  no  extra  cost  associated  with  the  procedure  for  routing  multiple  copies  of  a  result  to 
memory. 

Similar  procedures  are  then  utilized  in  updating  the  values  of  u  from  MPs  2,  3,  and  4, 
for  red  points  only.  In  the  second  half  of  the  procedure,  u  is  updated  at  black  points,  and 
the  operands  for  a  given  procedure  are  accessed  from  the  same  memory  planes  as  for  red 
points,  but  the  constant  stride  addressing  begins  at  a  starting  address  offset  by  1  from  the 
red  point  update  procedure.  With  this  sequnce,  values  of  u  from  MPs  5,  6,  7,  and  8  will 
always  be  maintained  at  the  iteration  level  i/,  insuring  proper  operation  of  the  scheme. 

With  the  constant  stride  of  2  in  addressing  the  data,  vector  lengths  for  this  procedure 
are  IxJxK/8,  rather  than  IxJxK/4  as  with  Point  Jacobi.  Furthermore,  with  the  more 
complicated  update  scheme  of  Red-Black  SOR,  it  is  not  possible  to  update  the  values  of  u 
for  more  than  one  set  of  vectors  at  a  time,  using  the  present  memory  allocation  procedure. 
Hence,  this  procedure  runs  about  twice  as  slow  as  the  Point  Jacobi  procedure,  having  a 
projected  computation  time  per  iteration  of  116+4l+IxJxK  clock  periods.  The  operation 
count  for  this  procedure  is  13  IxJxK,  and  the  nominal  operation  rate  is  260  Mflops. 

4.3  Multigrid  Components 

The  role  of  the  multigrid  method  is  to  accelerate  the  iterative  solution  of  the  Helmholtz 
and  Poisson  equations.  The  extra  steps  of  the  multigrid  procedure  involve  restriction  of  the 
solution  down  onto  coarser  grids,  relaxation  of  the  coarse  grid  solutions,  and  prolongation 
of  the  coarse  grid  solutions  up  onto  finer  grids.  The  restriction  operation  is  performed 
with  straight  injection,  the  relaxation  operation  with  under-relaxed  Point  Jacobi,  and  the 
prolongation  operation  with  trilinear  interpolation.  In  the  remainder  of  this  discussion,  it 
is  assumed  that  Mt  grid  levels  are  to  be  used,  where  the  dimension  of  grid  Mg  is  IxJxK, 
grid  M,_ i  is  (I/2)x(J/2)x(K/2),  grid  Af,_2  is  (I/4)x(J/4)x(K/4),  etc. 

A  primary  concern  in  implementing  the  multigrid  algorithm  on  the  NSC  is  the  memory 
plane  allocation  procedure  for  storing  the  array  elemnts  on  the  various  grids.  Consider 
the  solution  of  the  Helmholtz  equation  for  u.  For  the  finest  grid,  the  memory  plane 
allocation  procedure  for  storing  u  is  the  same  as  described  previously.  Similar  memory 
plane  allocation  procedures  are  used  on  the  other  grids,  where  the  procedures  are  set  up 
such  that  the  MPs  utilized  on  a  coarse  grid  are  different  from  those  used  on  the  next  finest 
grid.  Therefore,  values  of  u  for  grid  Mg-\  are  stored  in  MPs  5-8,  for  grid  Mg~2  in  MPs 
9-12,  etc.  Values  for  the  right  hand  side  of  each  grid  are  stored  in  a  similar  manner. 
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Using  injection  in  the  restriction  operation,  the  transfer  of  data  from  fine  to  coarse 
grids  is  performed  using  the  Memory  Interplane  Routing  Unit.  Considering  the  injection 
operation  from  grid  Ma  to  grid  Mg- 1,  values  of  u  from  MP5  are  accessed  from  MP1,  MP6 
from  MP3,  MP7  from  MP1,  and  MP8  from  MP3.  Due  to  the  nonuniform  nature  of  the 
addressing  stride  required  for  these  procedures,  programming  the  Node  to  perform  this 
operation  is  quite  complex.  However,  the  intranode  data  transfer  is  fairly  rapid,  taking  on 
the  order  of  IxJxK/16  clock  periods 

Once  the  solution  has  been  restricted  to  a  coarse  grid,  a  specified  number  of  relaxations 
are  performed  on  the  solution.  The  relaxation  operation  on  coarse  grids  is  nearly  identical 
to  the  Point  Jacobi  iteration  procedure  described  previously.  The  only  significant  differ¬ 
ences  are  that  the  convergence  check  isn’t  performed,  and  the  residual  values  are  saved 
along  with  the  updated  solution.  Considering  grids  Mt  and  M,_ j,  the  relaxation  procedure 
requires  on  the  order  of  IxJxK/2  and  IxJxK/16  clock  periods,  respectively. 

The  most  complicated  operation  involves  prolongation  of  the  coarse  grid  solutions  onto 
finer  grids.  The  difficulty  with  implementing  this  procedure  arises  from  the  nature  of  the 
interpolation  procedure,  where  the  number  of  results  generated  is  eight  times  larger  than 
the  number  of  operands.  Although  there  are  very  efficient  ways  to  configure  the  ALU 
to  calculate  multiple  results  simultaneously,  conflicts  arise  when  attempting  to  store  the 
results  in  memory.  As  a  result,  the  time  to  prolong  the  solution  from  grid  Mg_x  up  onto 
grid  Mt  is  on  the  order  of  IxJxK  clock  periods,  and  the  nominal  operation  rate  for  the 
procedure  is  around  120  Mflops. 

4.4  Projected  Timing  Results 

The  projected  timing  results  for  performing  the  isotropic  turbulence  simulation  on  the 
NSC  are  based  on  the  mapping  of  computational  subdomains  of  dimension  256s  into  each 
Node.  For  a  64  Node  NSC,  the  computational  domain  is  1024s,  and  contains  roughly  109 
grid  points.  The  Point  Jacobi  and  Red-Black  SOR  algorithms  require  the  effective  storage 
of  11  variables  per  grid  point,  whereas  the  multigrid  method  requires  15  variables  per  grid 
point.  For  the  multigrid  method,  roughly  49%  of  the  available  storage  is  utilized. 

The  projected  timing  results  for  implementing  the  three  algorithms  on  the  NSC  are 
presented  in  table  3.  These  results  are  based  on  a  convergence  criteria  of  reducing  the  L2- 
norm  of  the  residual  by  a  factor  of  10-5.  Based  on  the  analytically  determined  asymptotic 
convergence  rates,  the  Point  Jacobi  and  Red-Black  SOR  methods  require  roughly  8  X  10s 
and  1650  iterations,  respectively,  to  attain  convergence  of  the  Poisson  equation.  Clearly, 
both  methods  axe  inadequate  for  performing  this  simulation  in  a  reasonable  amount  of 
time.  Furthermore,  for  problems  of  this  size  the  convergence  rate  for  Red-Black  SOR 
degrades  significantly  for  non-optimal  values  of  the  relaxation  parameter  u>.  For  more 
general  elliptic  problems  the  optimal  w  cannot  be  determined  analytically.  Thus,  the 
Red-Black  SOR  timing  result  is  overly  optimistic. 

Results  for  the  multigrid  method  are  based  on  the  use  of  9  grid  levels,  the  smallest  being 
4s  (1  grid  point  per  Node).  A  V-cycle  is  employed  with  3  relaxations  on  the  way  down 
and  1  one  the  way  up,  on  each  level.  For  the  Poisson  equation,  the  projected  smoothing 
rate  for  each  cycle  is  0.13:  it  then  takes  6  cycles  to  attain  convergence.  The  projected 
time  to  perform  the  calculations  for  a  full  cycle  is  3.04  sec.  Of  that  time,  63%  is  spent  on 
relaxation,  18%  on  prolongation,  2%  on  restriction,  and  the  remaining  time  is  accounted 
for  by  the  filtering  procedure.  Although  the  time  to  advance  the  solution  a  complete  time 
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Table  3:  Projected  performance  of  algorithms  on  the  NSC  for  a  1024s  grid.  Time 
denotes  the  time  to  advance  the  solution  one  time  step,  Jt  the  smoothing  rate  for 
solution  of  the  Poisson  equation. 


Method 

Time 
per  step 

¥ 

Poisson  eq. 
fraction 

Sustained  operation 
rate  (Mflops/Node) 

Point  Jacobi 

214  hrs. 

0.999 

0.999 
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Red-Black  SOR 

70.1  min. 

0.993 

0.988 

259 

Multigrid 

1.77  min. 

0.6 

0.513 

405 

step  is  fairly  impressive,  for  a  realistic  simulation  in  which  thousands  of  time  steps  are 
required,  the  total  computation  time  for  the  multigrid  method  is  still  quite  substantial. 
Less  than  13%  of  that  time  is  spent  on  evaluation  of  the  explicit  terms.  Thus,  more 
efficient  relaxation  schemes  for  the  Helmholtz  and  Poisson  equation  solvers  would  improve 
the  overall  performance  of  the  algorithm. 

As  one  would  expect,  the  sustained  operation  rates  for  the  single  grid  Point  Jacobi  and 
Red-Black  SOR  algorithms  approach  the  nominal  operation  rate  of  the  Poisson  equation 
solver.  For  the  multigrid  method,  the  substained  rate  is  less  than  75%  of  that  for  single  grid 
Point  Jacobi,  reflecting  the  effects  of  the  restriction  and  prolongation  operations,  which 
operate  fairly  slowly.  On  a  64  Node  NSC,  the  multigrid  code  sustained  operation  rate  is 
projected  at  around  25  Gflops. 

5.  CONCLUDING  REMARKS 

The  NSC  is  a  multi-purpose  parallel-processing  supercomputer  which  is  being  developed 
to  provide  an  efficient  means  for  simulating  large,  numerically  intensive,  scientific  problems. 
Rapid  solution  of  these  problems  is  attained  by  structuring  the  computational  procedures 
of  solution  algorithms  to  utilize  effectively  both  the  fine  grain  and  coarse  grain  parallelism 
inherent  in  the  NSC  architecture.  The  objective  of  this  paper  has  been  to  present  a  detailed 
description  of  the  procedures  involved  in  implementing  one  such  algorithm  on  the  NSC. 

Projected  timing  results  for  implementing  an  elementary  algorithm  for  simulating 
isotropic  turbulence  on  the  NSC,  indicate  that  operation  rates  at  around  60%  of  the 
peak  speed  are  attainable.  For  a  64  Node  NSC,  the  sustained  operation  rate  would  be 
in  excess  of  25  Gflops.  However,  the  timing  results  also  indicate  that  the  convergence  rates 
for  single  grid  iterative  methods  are  likely  to  be  woefully  inadequate  for  performing  the 
simulations  in  a  reasonable  amount  of  time.  Consequently,  a  more  desirable  approach  is 
to  incorporate  multigrid  methods  into  the  iterative  procedures.  It  must  be  noted  however 
that  the  accelerated  convergence  rates  of  the  multigrid  method  is  attained  at  the  cost  of 
increased  complexity  in  programming,  which  is  quite  substantial  for  the  NSC. 

The  algorithm  described  herein  is  perhaps  the  simplest  algorithm  with  which  transi¬ 
tion  and  turbulence  simulations  may  be  performed.  This  is  primarily  due  to  the  use  of  a 
uniform  Cartesian  grid  in  discretizing  the  physical  domain.  The  more  complicated  prob¬ 
lem  of  transition  in  wall-bounded  shear  flows  requires  the  use  of  grid  stretching  in  the 
direction  normal  to  the  wall;  a  spatially  developing  simulation  also  requires  grid  stretching 
in  the  streamwise  direction.  Although  the  present  algorithm  can  easily  be  modified  for 
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this  problem,  efficient  implementation  of  the  method  requires  much  more  robust  iterative 
procedures  for  solving  the  Helmholtz  and  Poisson  equations  than  those  considered  in  this 
analysis.  Furthermore,  the  suitability  of  existing  multigrid  methods  for  computations  on 
a  grid  which  is  stretched  in  more  than  one  direction  is  not  yet  clear.  An  alternative  is  the 
use  of  conjugate  gradient  methods. 
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ABSTRACT 


Singularities  in  the  solution  destroy  the  accuracy  of  numerical 
approximations  to  elliptic  problems.  Unfortunately  due  to  a  pol¬ 
lution  effect  the  accuracy  deteriorates  globally  in  the  whole 
domain.  On  equidistant  meshes  certain  corrections  can  be  used 
in  order  to  recover  the  high  accuracy.  In  particular  the  pollut¬ 
ing  effect  can  be  suppressed. 

These  corrections  are  studied  for  interesting  model  situa¬ 
tions  like  Laplace’s  equation  with  nonsmooth  boundary  data  and 
Poisson’s  equation  with  singular  source  terms.  Theoretical  as 
well  aB  experimental  results  demonstrate  that  h*  convergence  as 
for  smooth  solutions  is  obtained. 

The  application  of  the  corrections  within  the  multigrid 
method  is  discussed.  In  particular  the  combination  with  Richard¬ 
son  or  T-extrapolation  is  shown  to  allow  the  highly  accurate  com¬ 
putation  of  singular  solutions. 
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1.  Introduction. 

Typical  error  estimates  for  the  numerical  solution  of  boundary  value 
problems  depend  on  the  smoothness  of  the  true  solution.  In  many  error  esti¬ 
mates  norms  of  the  solution  derivatives  play  an  essential  role.  Elliptic,  boun¬ 
dary  value  problems  may  have  singular  solutions,  however.  The  solution  has 
unbounded  derivatives  (or  is  itself  unbounded)  in  the  neighborhood  of  certain 
points  of  the  solution  domain.  Such  singularities  may  be  caused  by  a  variety 
of  situations,  which  happen  to  be  present  in  many  practical  applications.  Rea¬ 
sons  for  the  emergence  of  singularities  can  for  example  be  reentrant  corners, 
discontinuous  coefficients,  source  terms  with  singularities  (i.e.  point  loads, 
dipoles,  etc.)  or  singular  functions  in  the  boundary  conditions. 

Singularities  in  the  solution  can  cause  difficult  numerical  problems.  If  no 
precautions  are  taken,  the  accuracy  deteriorates  depending  on  the  strength 
of  the  singularity.  Problems  with  discontinuous  coefficients  may  locally  behave 
like  r®,  where  r  is  the  distance  from  the  singularity  and  the  exponent  is  a=0.1 
or  worse.  In  this  case  the  error  expansion  starts  with  terras  h*®.  This  rate  of 
convergence  is  unacceptably  slow.  Additionally  there  is  the  so  called 
pollution  effect.  The  deterioration  in  accuracy  is  not  restricted  to  a  neighbor¬ 
hood  of  the  singularity  but  the  accuracy  is  destroyed  in  the  whole  domain. 

Furthermore  iterative  solvers  for  the  discrete  equations  may  become  less 
efficient.  This  has  also  been  observed  in  the  multigrid  context.  If  the  discrete 
solutions  are  poor  approximations  to  the  true  solution,  coarse  grid  solutions 
are  poor  approximations  to  fine  grid  solutions.  Thus  the  coarse  grid  correc¬ 
tion  process  in  a  multigrid  method  is  lesB  effective. 

One  technique  for  avoiding  above  problems  is  to  use  local  grid  refinement. 
Though  this  is  the  standard  approach,  it  has  some  obvious  disadvantages. 
The  data  structures  get  much  more  complex,  making  the  programming  difficult. 
The  computational  overhead  for  processing  these  data  structures  may  be  quite 
large. 

So  there  is  the  question  whether  simple  modifications  of  the  discrete 
equations  on  equidistant  grids  can  recover  satisfactory  accuracy.  For  the 
case  of  reentrant  corners  such  modifications  have  been  suggested  in  [12]  and 
[4].  The  situation  with  discontinuous  coefficients  has  been  examined  in  [8]. 
In  all  these  situations  0(h*)  accuracy  could  be  recovered.  The  difference 
equations  were  modified  at  a  single  point  of  the  regular  grid,  only. 

This  paper  is  concerned  with  the  remaining  cases:  Explicit  singularities  in 
the  boundary  conditions  and  singularities  in  the  source  terms.  We  suggest  to 
use  equidistant  grids  with  modifications  of  the  boundary  values  (respectively 
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source  terms)  only.  We  will  show  analytically  and  experimentally  that  0(h2) 
convergence  as  for  smooth  solutions  can  be  recovered  at  points  far  from  the 
singularity.  The  pollution  effect  is  eliminated.  For  the  case  of  singular  boun¬ 
dary  conditions  it  will  even  be  shown  that  the  modification  of  one  boundary 
value  is  sufficient  for  getting  0(h*)  convergence. 

For  simplicity  we  will  study  Laplace’s  and  Poisson’s  equation  on  rectangu¬ 
lar  two-dimensional  domains,  only.  The  basic  idea,  however,  is  not  restricted 
to  these  simple  model  problems.  It  can  be  extended  to  more  general  regions 
and  operators.  Our  results  are  formulated  for  the  case  of  one  isolated  singu¬ 
larity  in  the  solution  domain.  Of  course  they  can  be  generalized  to  cases  with 
several  singularities  by  using  the  superposition  principle. 

The  paper  is  organized  as  follows.  Section  2  introduces  some  notation  and 
cites  two  theorems  on  which  our  technique  is  based. 

The  third  section  studies  solutions  with  singularities  in  the  boundary 
conditions.  Two  theorems  show  that  0(ha)  convergence  can  be  obtained  by 
either  smoothing  the  boundary  values  in  a  fixed  neighborhood  of  the  singu¬ 
larity  or,  alternatively,  modifying  one  single  boundary  value.  These  theoreti¬ 
cal  results  are  certified  by  numerical  experiments.  Richardson’s  extrapolation 
may  be  used  to  improve  the  accuracy  further. 

In  the  fourth  section  we  examine  Poisson’s  equation  when  the  source 
terms  are  point  loads  or  derivatives  of  delta  functions.  One  problem  is  to 
define  numerical  equivalents  of  these  unbounded  source  terms.  It  will  be 
shown  that  if  the  correct  numerical  analogues  are  chosen,  the  order  of  con¬ 
vergence  is  0(h2). 

The  fifth  section  is  devoted  to  r-extrapolation.  This  is  a  multigrid  specific 
extrapolation  technique.  It  is  usually  also  based  on  smoothness  assumptions 
on  the  true  solution.  We  will  demonstrate  that  it  can  also  used  for  singular 
solutions.  In  this  case  the  type  of  singularity  and  corresponding  error 
expansions  must  be  considered  when  choosing  the  extrapolation  parameters. 
As  another  alternative  T-extrapolation  may  be  combined  with  the  modifications 
introduced  earlier,  leading  to  very  accurate  approximations. 

2.  Basic  Definitions  and  Theorems. 

In  this  section  we  introduce  some  basic  notations  and  results.  The 
Laplace  operator  A  in  two  dimensions  iB  defined  by 
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For  studying  difference  equations  we  introduce  the  difference  operators 
<$£u(x,y)  =  h'*  [u(x-h,y)  -2u(x,y)  +  u(x+h,y)] 

<JJu(x,y)  =  h‘J  [u(x,y-h)-2u(x,y)  +  u(x,y+h)j 
The  usual  5-point  discretization  of  the  Laplace  operator  is  defined  by 
A„  =  dj+d* 

In  this  paper  we  restrict  our  attontion  to  the  unit  square 
0  =  (0,l)x(0,l) 

0  denotes  the  closure  and  30  the  boundary  of  0.  The  numerical  approxima¬ 
tions  are  defined  on  equidistant  grids  with  meshsize  h: 


%  = 


I  (ih,jh),  0 <i,j<N,  h=l /N,  i,j,N  integer 
f>b  is  defined  by 


n. 


(ih,jh),  0<i<N,  0$j$N,  h=l/N,  i,j,N  integer 


and  the  boundary  of  0„  by 

ao„  =  0„  -0„ 

The  following  definition  and  theorem  is  taken  from  [12]. 


Definition:  A  family  of  functions  u„<x,y)  is  called  h -bounded  on  0,  if  there 
exists  a  real  valued,  continuous  function  r(x,y)  on  0  (not  necessarily  bounded 
on  0),  such  that  for  every  (x,y)€fl  there  exists  hl>0,  such  that 
I  u„(x,y)  |  ^r(x,y)  for  all  h  <hlt  h=l/N,  N  integer,  (x,y)€ 0ft.  If  r(x,y)  is 
bounded  on  0,  ua(x,y)  is  called  strictly  h -bounded,  a 

The  crucial  point  about  h-boundedness  is  that  a  h-bounded  family  of 
grid-functions  uft(x,y)  may  be  unbounded  on  0  for  h-» 0,  but  because  of  the 
continuity  of  r(x,y)  be  bounded  on  any  compact  subset  of  0,  for  all  b.  This 
definition  is  used  in 

Theorem  2.1: 

Let  the  solution  of 

A„  u„  =  f„  on  flt. 

be  bounded  on  0„.  If  6b6*Jfb  is  h-bounded  on  0  for  0 and  0<j<m  , 
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then: 

<J*'dJ“u4  is  h-bounded  on  ft. 

The  proof  is  given  in  [12], 

Asymptotic  error  expansions  for  the  Laplace  equation  in  rectangular 
domains  are  studied  in  [6].  Even  when  there  are  singularities  on  the  boun¬ 
dary,  the  existence  of  asymptotic  expansions  can  be  proved.  We  present  one 
result  in 

Theorem  2.2: 

Let  u*  be  the  solution  of  the  boundary  value  problem 
Au  =  0  in  ft, 

u  =  f  (x)  on  3ft,  y  =  0, 
u  =  0  on  3ft,  y*0, 

where  /(x)  has  a  singularity  at  the  point  (xs,0)€dftk: 
f(x)  =  [g(x)  •(x-x(,)*},  «X) 

with  an  analytic  function  g(z)  on  [0,1].  Let  uk  be  the  numerical  solution 
Ak  u#  =0  in  ft„ 

ub  =  f(x)  on  3ft,  y=0 

uk  =  0  on  3ft,  y* 0 

Then  there  exists  an  asymptotic  expansion 
«•  •• 

u„  =  u'  +  Y^hm«umiu  +  Y.h*>u*‘ 

■>i  i=i 

For  the  proof  see  [6]. 

3.  Singularities  in  the  Boundary  Conditions. 

In  this  section  we  study  situations  as  described  in  theorem  2.2  in  some 
more  detail.  For  these  problems  the  analytically  correct  solution  ut  contains 
singular  components: 

u*(x,y)  =  Im  [g{x+iy)  *<x-x,+iy)“J  +  smooth  terms,  a>0,  x#€[0,l], 

with  g(z )  analytic  on  the  closure  of  ft.  u*(x,y)  is  harmonic  in  ft.  Provided  u* 
additionally  satisfies  the  boundary  conditions  it  roust  be  the  correct  solution. 
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u*  has  unbounded  derivatives  on  ft.  No  theorem  based  on  the  smoothness  of 
the  true  solution  is  applicable.  There  are  general  results  giving  convergence 
orders  for  the  corresponding  finite  difference  solutions.  For  example  0(h“) 
accuracy  is  guaranteed  by  the  theorems  in  [2].  However,  if  the  singularity  is 
located  on  the  boundary,  the  situation  is  not  that  bad.  Hofmann’s  theory  (see 
theorem  2.2  and  (6])  shows  that  the  numerical  solution  uk  haB  an  asymptotic 
expansion  starting  with  a  term  of  order  0(h**“)  at  fixed  points  in  the  interior 
of  ft  if  0<a<l.  The  essential  ideaB  of  Hofmann’s  proof  are  also  used  in  theorem 
3.2  below. 

With  the  following  theorems  we  demonstrate  that  0(h*)  convergence  (as 
for  smooth  solutions)  can  be  recovered,  if  the  boundary  values  f(x,y)  are 
modified  to  /k(x,y).  This  convergence  rate  is  valid  only  for  fixed  points  in  ft, 
not  for  points  approaching  the  singularity,  when  h-*0.  The  first  theorem  is 
based  on  studying  a  related,  smooth  problem. 

Theorem  3.1: 

Let  u*  be  the  solution  of  the  boundary  value  problem 

Au  =  0  in  ft, 


u  =  f  (x,y)  on  3ft, 

where  f  (x,y)  has  a  singularity  at  (xo,0)€3ft: 
f(x,y)  =  Im  [g(x+iy)*(x-x(,+iy),j,  «>0, 

with  g(z)  analytic  on  ft,  0  <x,  <1.  (Under  these  conditions  f(x,y)  defines  a 
harmonic  function  on  ft  satisfying  the  boundary  conditions  and  thus 
u*(x,y)  =  f  (x,y)).  Let  uk  be  the  solution  of  the  modified  numerical  analogue 


Akuk  =  0  in  ftk, 

u*  =  <$*F(x,y)  on  3ft,  y= 0, 


uk  =  t(x,y)  on  3ft,  y*0, 

where  F(x,y )  is  a  function  satisfying 


3* 

3x* 


Ifasai 


where  rk 


F(x,y)  =  /  (x,y). 


u*  +  h*rk, 

is  h  -bounded  on  ft. 
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Proof:  Consider  the  related  problem 
LV  =  0  in  ft, 

U  =  F(x,y)  on  3ft, 

ii 

with  the  exact  solution  U *  (  — — r  U*  =  u*  )  and  the  numerical  analogue 

3x* 

A,  U,  =  0  in  ft,, 

U,  =  F(x,y)  on  3ft,. 

If  J?,  is  defined  by 
U,  =  U*  +  h*Rk, 

then 

A,R,  =  — A,  U*  in  ft,, 

Rk  =  0  on  3ft,. 

The  right  hand  side  of  this  problem  satisfies 

1.  — ^-A,  (/*  is  h-bounded  in  ft 

h*  * 

*  * 

2.  61  — ^-A,  U*  is  h-bounded  in  ft. 

I  h 

This  can  be  seen  by  using  Taylor  expansions  and  smoothness  properties  of  U* 
for  estimating  the  remainder  terms.  Furthermore  Rk  is  bounded  on  ft,  (see 
e.g.  [6]). 

Using  theorem  2.1  this  implies  that  6*Rk  is  h-bounded  in  ft. 

Now  define  tf,  =  6* Uk.  ( Uk  can  be  extended  to  points  outside  ft,  such  that 
A,  Uk  =0  at  points  on  the  boundary,  and  thus  i7,  is  defined  in  ft,  including  the 
boundary.)  i7,  is  the  solution  of  a  finite  difference  equation 

A,  <7,  =0  in  ft, 

<7,  =  6\F(x,y)  on  horizontal  boundary  lines 

t7,  =  <5*F(x, y) +0(h’)  on  vertical  boundary  lines 
Moreover 

=d*(u,-U»)  ♦  =  h*(d*R,+q,) 
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Both  6jRk  and  qk  term  are  h -bounded  on  0.  It  remains  to  show  that  ilt  -u,  is 
of  order  0(h*).  This,  however,  can  be  easily  seen  by  comparing  the 
corresponding  numerical  boundary  value  problems.  They  differ  only  in  the 
boundary  values,  and  this  difference  is  of  order  0(h*).  The  discrete  maximum 
principle  guarantees  that  Jk  -uk  is  h *  times  a  h -bounded  gridfunction.  This 
completes  the  proof,  a 
Remark:  Theorem  3.1  can  be  sharpened  to 

u#  =  u*  +h*ut+0(h***) 

using  the  more  involved  techniques  of  the  following  theorem. 

Note  that  the  modification  of  the  boundary  values  to  f  need  only  be  done 
in  a  fixed  neighborhood  of  the  singularity.  A  suitable  f  can  be  constructed 
using  one  of  the  following  formulas 

x*b/l  \*b/i 

f(x)=  J  J  f  (r)drd^ 

x-b/t  \rb/l 

or  equivalently 


/<*)  =  J  K(*-W)/(i)d5 


where  the  kernel  K(x,h)  is  given  by 


K{x,h) 


*-|»l 

h* 


0 


for  |  x  |  $  h 
otherwise 


This  procedure  is  a  smoothing  of  the  boundary  values.  The  integral  kernel  is 
a  linear  B-spline.  If  B-splines  of  higher  order  are  used,  the  boundary  values 
are  smoothed  more.  Then  h*  expansions  of  the  error  up  to  corresponding 
orders  can  be  obtained. 

In  the  following  we  will  discuss  an  alternative  technique  and  proof.  This 
approach  will  lead  to  one  point  modifications.  It  is  interesting  that  it  is  pos¬ 
sible  to  obtain  the  improvement  in  convergence  by  modifying  not  all  values 
along  a  boundary  line,  but  at  a  single  point  only.  This  time  the  proof  is 
based  on  explicit  representations  of  the  true  and  numerical  solution  as 
Fourier  series.  Our  result  is  stated  in 
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Let  u*  be  the  solution  of  the  boundary  value  problem 
Au  =  0  in  ft, 

u  =  t(x)  on  3 Cl,  y= 0, 
u  =  0  on  3ft,  y*0, 

where  f(x)  has  a  singularity  at  xo€(0,l),  x#=Mh,  A#  integer: 
g(x)(x-x#)“  for  x>x0 

=  lo 


otherwise 


Assume  that  g(x)  is  analytic  on  [x#rl]  and  0  <a<2.  Let  u„  be  the  solution  of 
the  modified  numerical  analogue 


o 

II 

a 

o 

in  Cit, 

u,  =  f(x) 

on  an*, 

y=o, 

o 

II 

* 

3 

on  an4I 

y  *0. 

Here  t  (x)  is  defined  by 

[-h«g(x#)?( 
f(x)  =  f(x) 


0K(-o)  for  x  =  x„ 

otherwise  ’ 


where  £  is  Riemann’s  function . 

Then  uh  has  an  asymptotic  expansion 

u„  =  u*+h*u1+0(h**«) 

Proof:  According  to  Hofmann  [6]  the  true  solution  and  the  numerical  solution 
can  be  represented  by 


u*(x,y)  =  Y.  a,  sin(nrrx)T(n,y)  for  y  >  0 


u4(x,y)  =  Y  b„(N)sin{nvx)T(n„,y)  for  y  >  0 


respectively.  h  =  l/N.  T(n,y)  is  defined  by 

T{n,y)  =  SP^Q-r.)) 

sinh(jirr) 
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The  numbers  ea  are  Fourier  coefficients 


aa  =  2  J  /(x)Bin(nrrz)  dx. 

The  numbers  ba(N )  are  trapezoidal  sums 

i«i  i 

and  /Lia  is  defined  by  the  equation 
sinh(  )  =  sin(  2N 

Following  [6]  the  discretization  error  can  be  split  into  three  terms. 
u*(x,y)-ua(x,y)  =  ^(x.y.h) +Jl(x,y,h) +I,(x,y,h) 


where 


I,(x,y,h)  s  Y_  a,Bin(nnx)T(n,y), 
m*n 

H-l 

It(x,y,h)  a  Y_  «.8in(nnx)  [r(n,y)-T(pa,y)j, 

•  el 
#-l 

Ij(x,y,h)  =  Y_  (a,-b.(N))«in(nnx)T(/Lia,y), 


The  first  two  terms  have  asymptotic  expansions  in  terms  of  h*  if  t(x)  is 
bounded,  piecewise  continuous,  and  BatisfieB 


i  r 

t(x)  =  —  limf(x+t)  +  lim/(x+t)L  tor  0<x<l 
<4  «-*#*  *-*”  [ 


f(0)  =  Urn  f(x) 


/(l)  =  lim  /(x) 

In  contrast,  It  depends  on  further  properties  of  /.  In  the  above 
representation  of  It  the  numerical  integration  errors 


*d  -  ba(N)  a  2  J  t  (x)sin(nnx)dx  -  -j|^ /'(-£)  sin 
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play  an  essential  role. 

In  fact,  Hofmann  proves  that  if  the  numerical  integration  error  of  the 
first  Fourier  coefficient  has  an  error  expansion  of  the  form 

p 

at  -  b^N)  =  J>(hT|  +  OUT'*) 

1ml 

then  I%  (and  thus  the  discretisation  error)  can  be  expanded  to 

r 

u(x,y)  -  u»(x,y)  =  £s,(x,y)h“  +  0(h*'«) 

1st 

where  the  a ,  are  members  of  the  finite  set  of  numbers 
I2J+T,,  j> 0,  i>0,  i+j* 0,  2j  +t(  <tp 


Thus  the  problem  reduces  to  determining  the  error  of  a  numerical 
integration.  We  can  use  a  result  of  [11],  which  giveB  a  generalization  of  the 
Euler-Maclaurin  formula.  If  f  (x)  is  analytic  in  Oor  <1,  a>0  then 


1 

S-l 

r  ■> 

■ 

I  x«f  (x)dx  -  ± 

l 

k 

N 

0 

kd 

l  J 

J 

where  Bla  are  Bernoulli  numbers.  Using  x0  =  Mh 
a.  -  b.(N)  = 


1  »-» 

r  .  1 

■* 

2 

J  t (x)sin(nrrx)dx  -  jj  Y  f^Bin 

nni 

N 

1 

— J^f  (x.JsintniTX,,) 

X0  laMn 

l  J 

j 

=  2 


1 

J  g(x)(x-x#)*sin(nnx)dx  - 

*0 


K-t 


i.Z  '<*> 


l«#f* 1 


p  p 

K 

r  1 

' 

i-M 
N  j 

sin 

nni 

N 

-■^/(x0)sin(nnx,) 

4 
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where 

Gid)  =  V*  [x0H(l-*«)]sin  [mr(x#  +  £(l-x,,))] 

and 

G2(l)  =  g  [*.H( !”*«)]  Bin  [nnU,  +  S(l-*0))] 

Thus 

a«  -b„(N)  =  -2G,(0)C(-<x)h‘«  -  ZhflxJsinlnnjr.)- 

b  r  h  v 

2(1-x0)1"-5*-G1'(1)  -5~  +0(h*«) 

Because  of  our  definition  of  f(x0)  the  0(h1Ml)  termB  cancel.  As  we  assumed 
0<a<2  the  next  terms  in  the  expansions  are  of  order  0(hl)  and  0(h**“).  This 
completes  the  proof.  □ 

Numerical  Experiments. 

Now  we  demonstrate  the  effectiveness  of  of  our  techniques  with  numerical 
experiments.  The  test  problem  is 

Au  =  0  on  fl, 

u(x,y)  =  Re  ((x+iy -0.5  )#,‘]  on  3ft, 


such  that  the  analytically  correct  solution  is 
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u*(j t,y)  =  Re  [(x+iy-0.5)4,5]  =  r#,s cos(>p/2), 
where 

r  =  |  x+iy-0.5 1 ,  <f=nrg  (x+iy -0.5). 

This  function  is  shown  in  the  figure  below. 


We  compare  three  different  numerical  methods.  Method  1  uses  the  unmodified 
boundary  conditions.  Here  the  results  of  Hofmann  predict  0(fi,-s)  conver¬ 
gence.  For  Method  2  the  boundary  values  are  replaced  by  smoothened  ones 
according  to  theorem  3.1.  Method  3  finally  uses  a  single  modification  at  the 
point  (0.5,0)  according  to  theorem  3.2.  In  particular  f  (1/2,0)=  £(-0.5),l,h“  = 
0.207886h°,s  replaces  f  (l/2,0)  =  0.  The  calculations  are  performed  on  grids  with 
h  =  1/8,1/16,1/32,1/04,1/128.  The  following  table  shows  the  discretization 
errors  ub(x,y)-u(x,y)  evaluated  at  points  (x,y)€fi,,.  The  last  column  gives 
the  average  convergence  order. 
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Discretization  Errors  for  Different  Representations  of  the  Boundary  Values. 


(x,y) 

meshsize 

method 

1/8 

1/16 

1/32 

1/64 

1/128 

conv. 

order 

(1/2, 1/2) 

1 

.101e-l 

.324e-2 

.108e-2 

.369e-3 

•127e-3 

1.58 

2 

.864e-3 

•212e-3 

.528e-4 

.132e-4 

•330e-5 

2.00 

3 

.209e-2 

.495e-3 

.122-3 

,305e-4 

.763e-5 

2.02 

(1/4, 1/2) 

1 

•636e-2 

•214e-2 

.720e-3 

.244e-3 

.837e-4 

1.56 

2 

,639e-3 

.174e-3 

.447e-4 

.112e-4 

.281e-5 

1.95 

3 

,150e-2 

•402e-3 

•103e-3 

.263e-4 

.664e-5 

1.95 

(1/2, h) 

1 

.645e-l 

•462e-l 

.328e-l 

.232e-l 

•164e-l 

0.48 

2 

•243e-3 

.121e-3 

.800e-4 

.558e-4 

.393e-4 

0.65 

3 

.124e-l 

.874e-2 

.618e-2 

•437e-2 

.309e-2 

0.49 

At  the  point  in  the  interior  of  the  domain  the  predicted  convergence  orders 
can  be  clearly  observed.  Method  2  and  3  both  converge  with  0(h*),  though 
method  2  yields  somewhat  lower  errors.  Method  1  demonstrates  the  pollution 
effect.  Even  far  from  the  singularity  the  convergence  deteriorates  to  0(h1,s). 
Further  note  that  at  the  point  approaching  the  singularity  all  methods  con¬ 
verge  with  O(h0,s)  only. 

Richardson  extrapolation  can  be  used  on  these  results.  In  method  1  the 
0(h1,5)  errors  can  be  eliminated,  in  method  2  and  3  the  0(h*)  errors.  For 
integrating  Richardson’s  extrapolation  into  a  full  multigrid  algorithm  see  [10], 
The  following  table  presents  the  results.  The  last  column  gives  the  average 
convergence  order  of  the  extrapolated  results. 


Discretization  Errors  using  Richardson  Extrapolation 


(x,y) 

meshsize 

method 

1/8 

1/16 

1/32 

1/64 

conv.  order 

(1/2, 1/2) 

1 

-.565e-3 

-,952e-4 

-.209e-4 

-.502e-5 

2.27 

2 

-.523e-5 

-.243e-6 

-.167e-7 

-.333e-8 

3.53 

3 

-.359e-4 

-,185e-5 

-.123e-6 

-.133e-7 

3.79 

(1/4, 1/2) 

1 

-.167e-3 

-,578e-4 

-.158e-4 

-.410e-5 

1.78 

2 

,202e-4 

.130e-5 

.700e-7 

.333e-8 

4.19 

3 

•368e-4 

,385e-5 

.527e-6 

.867e-7 

2.91 

Rude 


531 


4.  Point  Loads  and  Higher  Derivatives  of  Delta  Functions. 

Singular  solutions  can  also  be  caused  by  the  equation’s  right  hand  Bide. 
An  especially  interesting  case  are  point  loads,  that  is  source  terms  which  con¬ 
tain  delta  functions  (or  their  derivatives).  When  trying  to  solve  such  a  prob¬ 
lem  numerically,  the  first  question  is  how  to  model  the  unbounded  delta  func¬ 
tion.  At  this  point  a  similar  trick  aB  in  the  previous  section  iB  used.  The 
numerical  equivalent  of  the  delta  function  is  obtained  by  integrating  the  ori¬ 
ginal  delta  function  and  taking  the  corresponding  finite  differences  of  the 
result.  This  is  an  analogous  smoothing  operation  as  was  used  in  the  previous 
section.  The  following  theorem  gives  the  detail. 

We  start  with  a  definition.  The  (generalized)  functions  Ht,  ( i  integer)  are 
defined  by 

r 

0  for  x  <0 

H0(x)  :=  ~  for  x  =  0 

1  for  x  >  0 


and  recursively 


H,(x) 


~-H.Ax)  for  i  >0 
dx  ‘  1 

* 

j  HlnWdZ  for  i  <0 


Derivatives  are  to  be  understood  in  the  distributional  sense.  Note  that  H0  is 
the  Heaviside  step  function  and  is  the  Dirac-<5-f unction . 

Theorem  4.1: 

Let  u*  be  the  (weak)  solution  of  the  boundary  value  problem 


Au  =  H^x-x0)H,(y-y0 )  in  0 


u  =  0 


on  df> 


where  p,v  are  positive  integers  and  {xo,y0)G fl.  Let  ut  be  the  solution  of 

A*u*  =  ft  in  nt 
ua  =  0  on  30^ 


where 
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fi,  =  (v*.  <*-*•>*.-*. 

and  where  n,m  are  chosen  such  that  2m  >p  and  2n>v,  such  that 
Hp-tm  ( x  ~xc  )H,-ta(y  -y0)  are  continuous  functions. 

Then: 

ut  =  u*  +h*rt 

where  rt  is  h-bounded  on  Cl-{(x0,y0)}. 

Proof: 

Smooth  boundary  values  (in  the  rectangle)  cause  errors  with  h*  expansions. 
Thus  we  may  change  the  boundary  values  (of  the  differential  and  numerical 
problem)  to 

u(x,y)  =  g(x,y)  on  3D 
where 


g(*’y)  =  ib-B 'l  aJ^T  [in  [(x-x0)1  +  (y-y0)2] 1/1  j 


ax*1-1  3y 
The  solution  is  given  by 


u*(x,y)  =  g(x,y) 

We  study  the  ’integrated’  problem 

<51/  =  HM_tm(x-x0)H,.tB(y-y0)  in  0 
U  =  F(x,y)  on  3n 

where 


F(x,y)  =  g(x,y) 


3x*«a  y“ 

and  its  numerical  equivalent 

x  )  tf.-an  ( x  -r0 ) 

l/j,  =  F(x,y) 

Define  Rh  by 


in  04 
on  3f!b 


(/„  =  U'+htRg 


Rb  satisfies 


kgRg  —  Qg 


<x-x#)J 1,_t„{y-y0)  - 


Rude 


533 


Using  Taylor-expansions  and*  smoothness  properties  of  U  for  estimating  the 
remainder  terms  we  can  show  that 

1.  Qb  is  bounded  on 

2.  4lk4*‘Qb  is  h-bounded  on  Cl-{(x0,y0)}  for  all  (KJf^m  and  O^J^n. 

Because  of  these  two  properties  Rb  is  bounded  on  0,  and  using  theorem  2.1 
even  4**4** Rb  is  h-bounded  on  O'.  Now  we  define 

u„  :=  4*m4*aUb  on  0* 

where  the  necessary  values  of  Ub  outside  Clb  are  assumed  such  that  Ub  satis¬ 
fies  Ub  =  F(x,y)  on  the  boundary  of  0b  (and  sufficiently  many  points  out¬ 
side  Ci„),  too.  iJj,  satisfies 

,  -  fb  on  0„ 

$b  =  S(x,y)+h*frb  on  304 

where  (because  of  the  smoothness  of  u  along  the  boundary)  rb  is  h-bounded. 

Thus  ub  and  Cfb  are  solutions  to  almost  the  same  numerical  boundary  value 
problem,  and,  because  of  the  discrete  maximum  principle  cannot  differ  by  more 
than  order  0(h*).  Furthermore 

«T*  =  4*m4*a  U*  *  h*4*B4*tt  Rb  =  u*  +  [4*m4*r‘  U*-u*j  +h*4l*4*yaRb 

Now  4**4*"  U*-u*  and  4*m4*BRb  are  h-bounded  on  Cl- {(x0,y0) ).  This  proves  the 
theorem.  Q 


Numerical  Example. 

We  give  a  numerical  example.  The  test  problem  is  as  in  theorem  4.1.  with  a 
point  load,  i.e.  p  =  v=l.  In  the  numerical  equivalent  the  right  hand  side  is 
given  by 


fb(x,y ) 


(h-|x-xj)(h-|y-yj) 

h* 


for  |  x-x0  |  >h  or  |y-y„|>h 
for  |x-x#|<h  and  |y-y0|<h 


This  is  equivalent  to 

fb(x,y)  =  4*4*H.l(x-x0)H.l(y-y0) 

In  the  example  we  use  (x0,y0)  =  (  1/3, 1/3).  Note  that  this  is  no  meshpoint.  The 
boundary  values  are  c.hoBen  correspondingly,  as  in  the  proof  of  theorem  4.1, 
so  that  the  true  solution  of  the  example  is  given  by 
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u*(*,y)  =  2“ In  [{x-l/3)*+(y-l/3)*]1/* 

The  table  summarizes  experimental  results  for  thiB  problem.  For  different 
values  of  h  we  measure  the  error  at  fixed  pointB  inside  the  region  R. 


Discretization  Errors  for  Point  Loads 

point 

(1/2, 1/2) 

(3/4, 3/4) 

meshsize 

1/8 

2.266e-03 

4.661e-04 

1/16 

9.097e-04 

1.387e-04 

1/32 

2.176e-04 

3.380e-05 

1/64 

6.028e-05 

8.719e-06 

1/128 

1.459e-05 

2.155e-06 

The  predicted  0(h*)  convergence  behavior  can  be  observed  for  h  <1/16. 

5.  t- Extrapolation  in  the  Presence  of  Singularities. 

A  very  interesting  technique  for  improving  the  accuracy  of  solutions  in  the 
multigrid  method  iB  the  so  called  t -extrapolation .  In  this  section  we  will  dis¬ 
cuss  how  x-extrapolation  can  be  used  in  the  presence  of  singularities. 

In  a  regular  multigrid  algorithm  two  iterations  are  used  alternatingly. 
The  smoother: 

u<«>  =  u^+SU’j.-L*  u„) 

and  the  coarse  grid  correction  (for  a  two  grid  method): 

<•*'>  =  u  <*•> +ijL;li‘  <■>) 

These  two  iterations  have  a  common  fixed  point  described  by  fh  -Lhub  =  0. 

The  typical  efficiency  of  multigrid  as  a  solver  of  the  fixed  point  equation 
is  caused  by  the  different  convergence  properties.  The  smoother  converges 
fast  for  certain  (usually  the  high  frequency)  solution  components,  but  con¬ 
verges  only  slowly  for  the  remaining  (low  frequency)  modes.  In  contrast  the 
coarse  grid  correction  converges  fast  for  the  low  frequency  modes,  but 
diverges  (slowly)  for  the  high  frequency  components.  If  these  contrary  pro¬ 
perties  combine  the  usual  multigrid  efficiency  is  obtained.  In  this  considera¬ 
tion  multigrid  is  viewed  only  as  a  solver  for  a  discrete  system  of  equations. 
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The  accuracy  of  the  solution  with  respect  to  the  solution  of  the  differential 
problem  is  only  related  to  the  discretization  of  L.  In  this  perspective  mul¬ 
tigrid  has  no  effect  on  the  accuracy  of  the  numerical  solution  with  respect  to 
the  analytical  solution.  Certain  properties  of  the  analytical  solution,  like  the 
presence  of  singularities,  may  of  course  have  effects  on  the  convergence  of 
the  multigrid  solver. 

The  multigrid  technique,  however,  also  offers  algorithmic  possibilities  to 
improve  not  only  the  algebraic  convergence  (convergence  towards  the 
discrete  solution  ut)  but  also  the  differential  convergence  (the  accuracy  of  u4 
with  respect  to  the  analytically  correct  solution). 

The  different  convergence  properties  imply  that  high  frequency  solution 
modes  in  the  discrete  solution  are  mainly  supplied  by  the  smoothing  process, 
while  low  frequency  components  are  contributed  by  the  coarse  grid  correc¬ 
tion.  On  the  other  hand,  consistency  is  a  property  of  low  frequency  modes. 
This  leads  to  the  idea  of  double  discretization:  In  the  coarse  grid  correction 
process  higher  order  discretizations  may  be  UBed.  Because  of  the  above  con¬ 
siderations  one  may  hope  that  the  final  solution  accuracy  is  improved  despite 
the  smoother  being  applied  with  respect  to  the  old,  less  accurate  discretiza¬ 
tions.  This  is  the  basic  idea  of  double  discretization.  It  can  alBo  be  under¬ 
stood  as  a  special,  multigrid  specific,  defect  correction  technique.  A  rigorous 
analysis  can  be  found  in  [1]  and  [5]. 

T-extrapolation,  finally  is  a  special  defect  correction  (double  discretiza¬ 
tion)  technique.  The  coarse  grid  correction  is  changed  to 

u(«w)  =  ui«>+I»v((l-¥)(/,-LBi;u‘«>)  +  ,!*(/* -L*u<«>)). 

The  defect  f„  -Lb  ut  is  replaced  by  a  linear  combination  of  defects  of  different 
gridlevels.  A  high  order  difference  scheme  iB  constructed  by  combining  the 
low  order  difference  operators  on  different  grids. 

This  differs  from  a  standard  truncation  error  extrapolation  by  still  using 
smoothing  with  respect  to  the  old,  low  order  equations.  Two  iterative 
processes  having  different  fixed  points  are  applied.  This  may  cause  problems. 
Details  can  be  found  in  [3,9]. 

Assume  the  true  solution  is  sufficiently  smooth  and  the  truncation  errors 
have  asymptotic  expansions  in  terms  of  h*.  If  the  restrictions  I*  are  chosen 
appropriately,  the  linear  combination  of  defects  with  *  =  4/3  will  be  of  order 
0(h*).  Further  assuming  that  the  multi-grid  iteration  converges  with  a  con¬ 
traction  rate  independent  of  b,  one  can  show  that  the  limit  of  the  method  can¬ 
not  differ  from  the  true  solution  by  more  than  0(h4).  For  this  one  must 
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observe  that  a  standard  smoother  like  Gauss-Seidel  changes  smooth  solutions 
only  by  0(h4). 

All  these  considerations  depend  on  the  smoothness  of  the  true  solution. 
The  improvement  in  accuracy  must  be  expected  to  fail  in  the  presence  of 
singularities.  In  the  following  we  will  present  experimental  results  showing 
that  T-extrapolation  may  still  be  used,  even  when  the  solution  has  singulari¬ 
ties.  However,  one  must  either  apply  modifications  as  introduced  above,  or 
use  extrapolation  parameters  adapted  to  the  special  asymptotic  behavior. 

Numerical  Example. 

We  demonstrate  the  effectiveness  of  T-extrapolation  with  a  numerical  experi¬ 
ment.  The  example  is 

hu  =  0  on  Cl 

l]1/4 

u  =  Re  (x+iy)-—  on  30 

£t 

The  true  solution  is  depicted  in  the  following  figure 


Figure  2. 
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We  compare  several  approaches.  The  problem  is  first  discretized  and  solved 
without  modification.  This  is  method  1.  For  method  2  we  use  a  t- 
extrapolation  where  the  extrapolation  parameter  is  chosen  such  that  possible 
li1  terms  in  the  defect  are  eliminated.  In  the  third  variant  a  similar  extrapola¬ 
tion  is  performed.  But  here  the  parameter  is  chosen  such  that  order  h1M 
terms  are  eliminated.  The  last  two  experiments  combine  the  corrections  of 
section  3  with  r-extrapolation.  Experiment  4  gives  the  results  for  a  solution 
with  modifications  as  described  in  theorem  3.1  but  uses  no  extrapolation  yet. 
The  5th  experiment  uses  the  same,  modified  boundary  values,  but  finally 
applies  an  additional  T-extrapolation  eliminating  h*  terms.  The  table  gives  the 
results.  The  error  is  again  measured  at  fixed  points  of  the  domain.  The  last 
column  gives  the  average  convergence  order. 


Errors  for  Variants  of  Multigrid  Methods  with  r-Extrapolalion 


meshsize 

( x,y ) 

method 

1/8 

1/16 

1/32 

1/64 

1/128 

conv. 

order 

(1/2, 1/2) 

1 

-3.88e-2 

-1.51e-2 

-6.18e-3 

-2.56e-3 

-1.07e-3 

1.29 

2 

-1.87e-2 

-7.35e-3 

-3.20e-3 

-1.35e-3 

-5.72e-4 

1.25 

3 

4.74e-3 

1.76e-3 

2.92e-4 

6.01e-5 

1.36e-5 

2.11 

4 

-1.40e-3 

-3.36e-4 

-8.33e-5 

-2.07e-5 

-5.18e-6 

2.02 

5 

-6.63e-4 

-3.33e-5 

-1.70e-6 

-4.23e-8 

3.26e-9 

4.40 

(1/4, 1/2) 

1 

-2.27e-2 

-9.47e-3 

-3.94e-3 

-1.64e-3 

-6.87e-4 

1.26 

2 

-1.13e-2 

-4.98e-3 

-2.08e-3 

-8.76e-4 

-3.68e-4 

1.23 

3 

1.77e-3 

3.47e-4 

1.15e-4 

2.62e-5 

6.31e-6 

2.03 

4 

-3.67e-4 

-1.04e-4 

-2.67e-5 

-6.73e-6 

-l,68e-6 

1.94 

5 

-6.32e-5 

-1.3Qe-5 

-4.16e-7 

-2.76e-8 

-9.80-1 0 

3.99 

(1/2, h) 

1 

-2.52e-l 

-2.16e-l 

-1.82e-l 

-1.53e-l 

-1.29e-l 

.24 

2 

-2.27e-l 

-1.93e-l 

-1.62e-l 

-1.37e-l 

-1.15e-l 

.24 

3 

-1.98e~l 

-1.66e-l 

-1.40e-l 

-1.18e-l 

-9.93e-2 

.24 

4 

-2.44e-3 

-2,00e-3 

-1.67e-3 

-1.40e-3 

-1.18e-3 

.26 

5 

-2.12e-3 

-1.79e-3 

-1.51e-3 

-l,27e-3 

-1.07e-3 

.24 

Note  the  following  observations: 

1.  The  normal  method  shows  0(h1,25)  convergence. 

2.  So  does  the  method  with  T-extrapolation  of  0(h2)  terms.  Here  only  the 
absolute  size  of  the  errors  is  improved  by  a  factor,  not  the  order  of 
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convergence. 

3.  When  the  extrapolation  ia  performed  with  respect  to  0(h1M)  then  the  con¬ 
vergence  behavior  is  improved  to  0(h*). 

4.  A  quite  similar  accuracy  (better  by  a  constant  factor)  is  obtained  with  a 
completely  different  technique:  modification  of  the  boundary  values  and 
straightforward  solution  of  the  discrete  system. 

5.  If  these  modifications  are  combined  with  t -extrapolation  of  h*  errors  the 
results  are  (experimentally)  0(h4)  accurate. 

6.  In  the  immediate  neighborhood  of  the  singularity  all  methods  converge 
with  order  h0M  only.  The  absolute  error,  however,  is  improved  by  the 
boundary  modifications. 

7.  Note  that  corrections  combined  with  extrapolation  yield  better  accuracy 
with  meshsize  h=  1/8  than  the  standard  approach  with  h  =1/128.  Or,  seen 
from  the  other  side,  in  order  to  get  errors  like  10"*  with  the  standard 
algorithm,  one  would  need  meshes  as  fine  as  h  =  1/10®.  This  is  far  beyond 
what  can  presently  been  treated  by  any  method. 

Note  that  not  all  convergence  results  are  justified  by  the  theory  of  the  previ¬ 
ous  sections.  We  never  proved  the  existence  of  h  expansions  up  to  order  A4. 
Additionally  there  is  the  problem  that  t -extrapolation  does  not  improve  the 
solution  directly.  Its  influence  is  indirect  and  based  on  the  truncation  errors. 
A  more  detailed  analysis  using  e.g.  techniques  of  [5]  or  [9]  is  necessary. 

6.  Implementation. 

All  the  above  experiments  have  been  performed  with  the  Munich  Multigrid 
Workbench  (see  [7]).  The  workbench  is  a  prototype  software  package  for  the 
programming  of  multigrid  methods.  Its  main  features  are: 

Convenience:  With  the  workbench  a  standard  multigrid  algorithm  can  be 
formulated  in  less  than  twenty  lineB  of  code. 

Safety:  The  formulation  of  the  algorithms  is  in  terms  of  gridfunctions  and 
operators.  There  is  no  possibility  to  make  low  level  errors  with  e.g. 
arrays  and  indices. 

Convenient  debug  facilities:  Each  step  of  the  algorithm  can  be  performed 
separately  and  analized  separately.  Intermediate  results  can  be  displayed 
graphically,  algorithms  can  be  traced,  etc. 

Portability:  The  package  iB  fully  portable  within  the  UNIXt  environment. 


t  UNIX  is  a  Trademark  of  Bell  Laboratories. 
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Flexibility:  The  approach  is  not  restricted  to  any  clasB  of  problems.  It 
has  been  used  for  variable  coefficient  convection  diffusion  problems  in 
two  dimensions  and  is  presently  being  extended  to  include  three  dimen¬ 
sional  problems. 

Efficiency:  The  workbench  has  been  used  to  solve  large  problems  with  10* 
unknowns  on  microcomputers. 

Parallelism:  The  workbench  is  designed  to  make  use  of  multiprocessor 
architectures  and  may  be  used  for  distributed  computations. 

Acknowledgements 

The  author  wishes  to  thank  Prof.  Dr.  Zenger  for  many  stimulating  discussions 

and  P.  Muszynski  for  several  helpful  comments. 

References 

1.  Auzinger  and  S tetter,  H.J.,  "Defect  Corrections  and  Multigrid  Iterations," 
in  Lecture  Notes  in  Mathematics  760:  Multigrid  Methods,  Proceedings  of 
the  Conference  Held  at  Koln-Porz,  November  23-27,  1981,  ed.  Hackbusch, 
W.,  Trottenberg,  U.,  Springer  Verlag,  Berlin  (1982). 

2.  Bramble,  J.H.,  Hubbard,  B.E.,  and  Zlamal,  M.,  "Discrete  Analogues  of  the 
Dirichlet  Problem  with  Isolated  Singularities,"  Siam  J.  Num.  Anal. 
5<1)(1968). 

3.  Brandt,  A.,  "Multigrid  Techniques:  1984  Guide  with  Applications  to  Fluid 
Dynamics,"  GMD  Studien  85  (1984). 

4.  FoBmeier,  R,,  "Differenzenverfahren  hoher  Ordnung  fur  elliptische 
Randwertprobleme  mit  gekriimmten  Randern,”  Dissertation,  Technische 
Universitat  Munchen  TUM-I8411(1984). 

5.  Hackbusch,  W.,  Multigrid  Methods  and  Applications,  Springer  Verlag,  Ber¬ 
lin  (1985). 

6.  Hofmann,  P.,  "AsymptotiBche  Entwicklung  der  Diskretisierungsfehler  beim 
Dirichlet-  und  Neumann-Problem  der  Laplace-Gleichung  in  Rechtecksge- 
bieten,"  Dissertation,  Technische  Universitat  Munchen,  (). 

7.  Riide,  U.  and  Zenger,  Chr.,  "A  Workbench  for  Multigrid  Methods,"  Institut 
fur  Informatik,  Technische  Universitat  Munchen  1-8607(1986). 

8.  Riide,  U.  and  Zenger,  Chr.,  "On  the  Treatment  of  Singularities  in  the  Mul¬ 
tigrid  Method,"  in  Lecture  Notes  in  Mathematics  760:  Multigrid  Methods  II, 
Proceedings  of  the  Conference  Held  at  Cologne,  October  1-4,  1985,  ed. 
Hackbusch,  W.,  Trottenberg,  U.f  ,  Berlin  (1986). 

9.  Riide,  U.,  "Multiple  r-Extrapolation  for  Multigrid  Methods,"  Institut  fur 
Informatik,  Technische  Universit&t  Munchen  1-8701(1987). 


540 


Singular  Solutions  of  Laplace’s  and  Poisson’s  Equation 


10.  Schuller,  A.  and  Qun  Lin,  “Efficient  High  Order  Algorithms  for  Elliptic 
Boundary  Value  Problems  Combining  Pull  Multigrid  Techniques  and  Extra¬ 
polation  Methods,"  Arbeitapapiere  der  GMD  192(December  1985). 

11.  Waterman,  P.C.,  Yob,  J.M.,  and  Abodeely,  R.J.,  “Numerical  Integration  of 
Non-Analytic  Functions,”  J.  Math.  A  Phys  43  pp.  45-50  (1964). 

12.  Zenger,  C.  and  Gietl,  H.,  "Improved  Schemes  for  the  Dirichlet  Problem  of 
Poisson's  Equation  in  the  Neighbourhood  of  Corners.,”  Numeriache 
Mathematik  30  pp.  315-332  (1978). 


A  Multigrid  Approach  for  Elasticity 
Problems  on  “Thin”  Domains 


J .  Ruge 

Computational  Mathematics  Group 
University  o£  Colorado  at  Denver 
Campus  Box  170 
1100  14th  Street 
Denver,  Colorado 

A.  Brandt 

Department  of  Applied  Mathematics 
Weizmann  Institute  of  Science 
Rehovot  ISRAEL 


Several  attempts  have  been  made  to  apply  multigrid  methods 
effectively  to  elasticity  problems  on  "thin"  domains  with  a  small 
fixed  boundary.  The  main  difficulty  here  is  that  convergence 
generally  degrades  as  the  ratio  of  the  length  to  the  width  of  the 
domain  increases.  This  is  related  to  the  well-known  problem  of 
"locking".  Attempts  to  overcome  this  difficulty  have  met  with 
limited  success.  Here,  we  present  a  multigrid  method  that 
exhibits  good  convergence  factors  independent  of  the  size  of  the 
problem  and  the  number  of  levels  used.  It  relies  on  the 
introduction  of  an  auxiliary  function  which  represents  the  shear 
stress  and  the  use  of  a  modified  relaxation  scheme. 
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1.  INTRODUCTION 

The  problem  of  elasticity  is  an  important  one  in  the  field 
of  structural  mechanics.  A  common  approach  to  solving  such 
problems  is  to  use  a  finite  element  discretization  of  the  problem 
based  on  linear  or  multilinear  test  functions  and  some  solver  for 
the  resulting  algebraic  equations.  There  have  been  several 
attempts  to  apply  multigrid  methods  in  this  way  (c.f.,  [1),  [2], 
[3]  and  (41).  Good  results  have  been  obtained  with  multigrid 
when  full  Dirichlet  boundary  conditions  are  used  or  when  the 
domain  is  roughly  the  same  size  in  each  direction.  However,  when 
the  domain  becomes  thin  (i.e.,  the  ratio  of  the  length  to  the 
width  or  thickness  becomes  small)  and  natural  (or  free)  boundary 
conditions  are  used  for  most  of  the  boundary,  multigrid 
convergence  can  deteriorate.  This  is  related  to  the  well  known 
phenomenon  of  "locking",  which  occurs  for  such  problems.  That 
is,  the  discrete  solution  is  a  bad  approximation  to  the  solution 
of  the  continuous  problem,  particularly  in  the  smoothest 
components,  and  the  size  of  the  solution  obtained  is  generally 
much  too  small.  If  a  fine  enough  grid  is  used,  the  solution  is 
accurate  enough.  However,  if  multigrid  methods  are  applied,  this 
difficulty  affects  the  coarse  grid  correction.  A  method  for 
overcoming  this  problem  is  presented  here. 

The  next  section  contains  a  brief  description  of  the 
elasticity  problem  and  results  of  a  straightforward  application 
of  multigrid  methods  to  the  case  of  thin  domains.  It  is  shown 
that  the  problem  cannot  be  coarsened  below  a  certain  level  (which 
depends  on  the  mesh  size  and  the  thickness  of  the  domain)  without 
impairing  convergence. 

In  Section  3,  a  simplified  model  of  the  thin  domain  problem 
is  used  to  point  out  the  problems  encountered  with  too  coarse  a 
grid.  Ve  show  that  two  types  of  problems  occur.  The  first  is 
that,  on  a  coarse  enough  level,  the  character  of  the  equation 
changes  due  to  the  boundary  conditions,  and  linear  interpolation 
is  not  accurate  enough  for  the  problem  as  it  stands.  The  second 
is  that  smoothing  also  deteriorates,  so  that  more  and  more 
relaxation  sweeps  are  required  for  coarser  levels. 

In  Section  4,  it  is  shown  that  the  equations  of  the 
simplified  problem  obtained  in  Section  2  can  be  transformed  by 
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introducing  an  auxiliary  function  representing  the  shear  stress, 
and  that  multigrid  can  be  applied  in  an  efficient  way  to  the 
resulting  system. 

In  Section  5,  it  is  shown  that  the  simplified  discretization 
can  be  used  in  practice  for  the  coarse  grid.  This  leads  to  the 
development  of  a  composite  method  which  consists  of  usual 
coarsening  to  a  given  level,  then  simultaneously  switching  to  the 
simplified  discretization  and  introducing  the  auxiliary  function, 
followed  by  further  coarsening  of  the  problem  in  its  new  form. 
Results  show  that  this  approach  yields  convergence  factors  which 
are  independent  of  the  length  and  thickness  of  the  domain  as  well 
as  the  fine  grid  mesh  size. 

Finally,  Section  6  contains  remarks  about  the  extension  of 
the  method  to  3-D  elasticity  problems. 


2.  A  MULTIGRID  APPROACH  FOR  2-D  ELASTICITY  PROBLEMS 

Consider  the  plane-stress  elasticity  problem  on  the  domain 
[0,l]x[0, e]  with  one  fixed  boundary  along  x  =  0  and  free 
boundaries  elsewhere.  Given  a  force  vector  £(x,y)  = 
(f1(x,y),f2(x,y) )T  acting  on  the  body,  the  problem  is  to  find  the 
displacements  of  each  point  (x,y)  in  the  x  and  y  directions, 
denoted  by  u(x,y)  and  v(x,y),  respectively,  which  minimize  the 
functional 


(1) 


7*^2  X1  J£  C  UI  +  2v  uxvy  +  vy  +  (uy  +  vx>2  ]  dx  d* 
A  v  x=0  y=0  * 

"  7^2  f  r  <flu  v  f2v>  dx  d* 

A  v  x=0  y=0 


over  all  functions  u  and  v  which  satisfy  the  fixed  boundary 
condition.  Here  E  is  Young's  modulus  and  v  is  Poisson's  ratio. 
(The  results  shown  in  this  paper  take  v  =  .3,  which  is  a  typical 
value  in  practice.)  This  leads  to  equations  of  the  form 

uxx  +  ^  uyy  +  ^  vxy  “  «1 
^  uxy  +  ^  vxx  +  vyy  =  *2 


(2) 


544 


Elasticity  Problems  on  “Thin”  Domains 


with  appropriate  boundary  conditions. 

Consider  a  finite  element  discretization  of  the  problem, 
using  a  uniform  square  grid  with  bilinear  test  functions.  Ve  now 
apply  multigrid  to  the  discrete  problem  in  a  straightforward  way. 
For  convenience,  assume  that  the  number  of  elements  in  each 
direction  is  a  power  of  2.  The  coarser  grids  are  obtained  by 
coarsening  the  previous  grid  by  a  factor  of  2  in  each  direction. 
When  this  is  no  longer  possible,  the  elements  are  coarsened  in 
one  direction  only,  as  illustrated  in  Fig.  1.  The  symbol  8 
denotes  the  nodes  where  u  and  v  are  defined. 


FIG.  1.  Pattern  of  coarsening  used  in  a  straightforward  multigrid 
application. 


Linear  interpolation  is  used  and  restriction  is  taken  to  be  the 
transpose  of  interpolation.  The  coarse  grid  operators  are  the 
usual  finite  element  discretizations.  Relaxation  is  standard 
Gauss-Seidel .  (Block  Gauss-Seidel,  where  both  u  and  v  at  a  point 


a 

» 

If 
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Tables  la,  lb  and  lc  give  asymptotic  convergence  factors  per 
cycle  in  the  reduction  of  the  residual  for  different  values  of  c. 
In  the  tables,  each  row  represents  a  different  fine  grid  mesh 
size,  while  the  columns  correspond  to  the  coarsest  grid  used  in 
cycling.  Ve  solve  the  coarsest  grid  equation  using  a  direct 
method.  Note  that  all  mesh  sizes  are  given  relative  to  e.  When 
hc  >  e,  hc  is  the  mesh  size  in  the  x  direction.  The  reason  for 
compiling  the  tables  with  hc  and  h*  given  relative  to  e  is  clear: 
there  is  little  dependence  of  the  convergence  factors  on  e. 

There  appears  to  be  some  dependence  on  h*  for  fixed  hc, 
particularly  when  few  grids  are  used,  but  this  is  due  to  the  fact 
that  the  best  rates  correspond  to  the  2  and  3-level  cycles,  which 
are  normally  somewhat  better  than  when  more  levels  are  used. 

Note,  however,  that  for  fixed  hc,  the  rates  approach  some 
asymptotic  value  as  h£  decreases.  Thus,  for  fixed  hc,  the 
convergence  factors  per  cycle  are  essentially  independent  of  the 
fine  grid  mesh  size.  (This  would  be  more  readily  apparent  if  the 
tables  were  continued.)  The  dominant  factor  in  determining 
convergence,  though,  is  the  size  of  hc  relative  to  e,  and 
convergence  starts  to  degrade  badly  when  hc  >  e.  This  clearly 
indicates  that  the  problem  lies  in  the  coarse  grid  correction. 
Often,  this  kind  of  trouble  can  be  solved  with  a  W-cycle,  but  in 
this  case  it  gives  little  improvement  for  hc  much  bigger  than  e. 
In  the  next  section,  the  reasons  for  this  behavior  are  examined. 


3.  ANALYSIS  OF  THE  THIN  DOMAIN  PROBLEM 

For  thin  domains  (i.e.,  e  >>  1),  convergence  degrades  as  the 
number  of  levels  used  to  solve  the  problem  increases.  The 
degradation  is  particularly  bad  once  the  domain  is  coarsened  in 
the  x  direction  only.  To  see  why,  consider  first  a 
simplification  of  the  problem.  Assume  that  u  is  linear  in  y  and 
that  v  is  constant  in  y.  In  terms  of  the  discrete  problem, 
assuming  that  u  is  linear  in  y  is  nothing  more  than  saying  that 
the  domain  is  one  element  thick  (i.e.,  h  =  e) .  Assuming  that  v  is 
constant  in  y  is  natural  for  a  thin  domain  (one  would  expect  the 
upper  and  lower  surfaces  of  a  thin  beam  or  plate,  even  when 
deformed,  to  have  about  the  same  shape).  Later,  results  will  be 


Ruge  and  Brandt 


547 


presented  to  show  that  these  assumptions  are  valid,  and  this 
simplified  problem  formulation  will  actually  be  incorporated  into 
the  proposed  method  on  a  particular  coarse  grid.  For  now,  note 
that  this  assumption  implies  that  we  can  write 

(3)  u(x,y)  =  ub(x)  +  ^  ub(x)  and  v(x,y)  *  vb(x) 

where  the  superscripts  t  and  b  denote  the  function  values  at  the 
top  and  bottom  of  the  domain,  respectively.  Substituting  these 
functions  into  (2),  ignoring  the  second  term  (since  it  only 
contributes  to  the  right-hand  sides)  and  integrating  with  respect 
to  y,  the  problem  is  then  to  minimize  the  functional 

Co  (ux)2  +  ux  ux  +  <ux>2  +  mfvj-  [  U--Jr—  +  vx  ]2  dx- 

This  gives  the  following  equations  for  the  functions  ub,  ub  and 
vb: 


2uxx  -  ^ 

ut  +  uxx  - 

3ll-v) 

sr 

ub  + 

UlrJL} 

£ 

vb 

vx 

=  g! 

(5)  ubx  -  3 .LI- v.1 

ufc  +  2ubx  + 

3LX-vl 

ub  - 

lilr-v 1 
£ 

vb 

vx 

lL.lz.vl 

£ 

ux 

ILlzvl 

£ 

ux  + 

3 ( 1- v) 

vb 

vxx 

X 

Now,  there  are  several  difficulties  which  can  be  noted  and  will 
be  described  briefly  here.  First,  the  terms  u£x  and  ubx  become 
negligible  when  e  is  very  small.  The  resulting  equations  are 
nearly  dependent  (i.e.,  the  second  is  nearly  the  negative  of  the 
first,  and  the  third  is  the  derivative  of  the  first  with  respect 
to  x).  This  means  that  relaxation  on  the  discretized  problem  is 
is  very  slow  to  converge  on  a  given  level.  However,  the  real 
purpose  of  relaxation  in  multigrid  methods  is  to  smooth  the 
error;  but  this  is  also  seriously  impaired  when  e  is  small.  To 
see  this,  suppose  (4)  is  discretized  with  meshsize  h.  Then,  for 
example,  in  the  equation  for  ub  at  a  particular  point,  the 
coefficients  corresponding  to  ub  at  neighboring  points  are  small 
compared  to  those  corresponding  to  ub  and  vb  (i.e.,  0(h-2) 
compared  to  0(s-2)  and  0(e-^h-*)).  This  means  that  very  little 
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smoothing  takes  place  in  relaxation,  so  multigrid  convergence  is 
very  slow.  Since  h  increases  on  coarser  levels,  smoothing  becomes 
even  worse  there. 

The  second  problem  is  not  so  obvious.  It  can  be  shown  (by 
eliminating  u*3  and  ufc  from  the  equation  for  v*3)  that  the  equation 
for  v*3  is  essentially  the  biharmonic  equation.  For  this  reason, 
linear  interpolation  for  v13  is  not  sufficient  for  V-cycle 
convergence  independent  of  the  problem  size.  Simply  increasing 
the  order  of  interpolation  for  v*3  will  not  solve  the  problem, 
since  smoothing  is  still  bad.  (Another  drawback  to  using  cubic 
interpolation,  at  least  in  a  Galerkin  setting,  is  that  the  size 
of  the  operators  on  coarser  grids  increases.) 

In  the  next  section,  a  method  for  overcoming  these 
difficulties  is  outlined. 


4.  A  MULT I GRID  APPROACH  TO  THE  SIMPLIFIED  PROBLEM 
The  two  difficulties  with  the  usual  multigrid  approach  mentioned 
in  the  previous  section  can  be  overcome  relatively  easily.  This 
is  done  by  introducing  an  auxiliary  function  P,  defined  as 
follows : 

(6)  p  =  W*  +  vj3 

6  * 


Substituting  this 
associated  with  P, 

into  (5)  and  using  (6)  as 
the  resulting  system  is 

the 

equation 

2uxx  +  uxx 

+  3 ( 1-v)  p 
€ 

* 

(ufc 

equation) 

uxx  +  2uxx 

-  .31. 1-v)  p 
€ 

* 

( ub 

equation) 

_  px 

*  g£ 

(V*3 

equation) 

-  iufc  +  lu*3 

-  ^ 

=  0 

(e 

equation) . 

The  underlined  terms  will  yield  the  diagonal  blocks  of  the 
discrete  system.  A  primary  motivation  for  introducing  P  in  this 
way  is  to  improve  smoothing  for  ufc  and  u*3  by  Gauss-Seidel 
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relaxation  by  making  2u£x  and  2ubx  the  dominant  terms  in  their 
respective  equations.  Clearly,  Gauss-Seidel  relaxation  cannot  be 
used  on  the  discretized  system  as  it  stands,  since  vb  does  not 
appear  in  its  equation.  However,  the  system  can  be  rearranged  as 
follows : 


(8a) 

_!x 

3= 

S$ 

(8b) 

iilzvl  , 

+ 

2uxx 

+ 

uxx 

= 

si 

(8c) 

-  * 

+ 

uxx 

+ 

= 

si 

(8d) 

-  p 

+ 

*»fc 

+ 

Iub  + 

v%  = 

0 

This  system  is  block  lower  triangular,  except  for  the  term  u£x. 
Consider  the  finite  element  discretization  of  this  system  using 
linear  test  functions  for  ufc,  ub  and  vb  and  piecewise  constant 
test  functions  for  P.  In  this  form,  very  good  smoothing  results 
when  a  full  relaxation  sweep  is  defined  as  follows: 

-  Perform  Kaczmarz  relaxation  on  (8a); 

-  Perform  Gauss-Seidel  relaxation  on  (8b); 

-  Perform  Gauss-Seidel  relaxation  on  (8c); 

-  Perform  Kaczmarz  relaxation  (restricted  to  vb)  on  (8d). 

In  addition,  in  the  multigxid  process,  linear  interpolation  can 
be  used  for  ub,  ub  and  vb,  and  piecewise  constant  interpolation 
can  be  used  for  P. 

Asymptotic  convergence  factors  for  a  V-cycle  using  one  of 
the  above-defined  relaxation  sweeps  before  coarse  grid  correction 
and  one  after  are  around  .15  for  a  wide  variety  of  e  and  h 
tested.  However,  asymptotic  factors  for  this  problem  do  not  mean 
much  here.  The  reason  is  that  asymptotic  factors  are  usually  an 
upper  limit  on  convergence  per  cycle,  but  with  this  problem,  the 
worst  cycle  is  generally  the  first.  Here,  the  simplified  problem 
is  meant  to  be  used  as  a  coarse  grid  problem,  so  only  one  or  two 
cycles  are  performed  before  proceeding  back  to  the  finer  grids. 
For  a  V-cycle,  the  residual  can  increase  (sometimes  by  a  factor 
of  greater  than  10)  in  the  first  cycle,  particularly  when  h  > 
e/2.  The  reason  for  this  is  not  yet  clear,  but  it  may  be  related 
to  the  initial  guess  for  P.  It  may  be  possible  to  choose  a  proper 
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Initial  £  based  on  u  and  v  which  results  in  better  £irst  cycle 
V-cycle  factors. 

In  any  case,  we  find  that  we  can  avoid  this  difficulty  by 
using  P-cycles .  This  type  of  cycle  has  approximately  the  same 
asymptotic  factors  as  the  V-cycle,  but  for  h  *  e  the  first-cycle 
factors  are  about  .22,  and  for  h  <  c  the  factors  are  about  .15. 
(When  h  >  e,  these  factors  can  get  worse,  but  do  not  seem  to  be 
over  1.)  For  this  problem,  it  is  clear  that  the  convergence 
factors  depend  not  on  the  coarsest  grid  used,  but  on  the  finest 
grid.  In  fact,  the  smaller  h  is  compared  to  c,  the  better  the 
convergence  factor.  This  fits  well  with  the  previous  observation 
that,  for  the  usual  discretization,  the  smaller  the  coarsest  grid 
mesh  size,  the  better  the  convergence.  If  these  two  problems  can 
be  tied  together,  with  the  simplified  problem  providing  the 
coarse  grid  correction  for  the  coarsest  standard  coarse  grid 
used,  then  good  overall  results  should  be  obtained.  Ve  show  how 
this  is  done  in  the  next  section. 


5.  THE  COMPOSITE  METHOD 

Here,  we  consider  the  question  of  when  the  simplified  problem 
resulting  from  the  assumption  (3)  provides  a  good  approximation 
to  the  full  problem.  For  our  purposes,  this  amounts  to  asking  at 
what  stage  in  usual  coarsening  can  such  a  simplified  problem 
provide  a  good  coarse  grid  correction.  This  process  of 
"switching"  from  one  discretization  to  the  other  is  illustrated 
in  Figure  2.  From  the  results  obtained  so  far,  it  appears  to  be 
best  to  introduce  the  simplified  discretization  as  early  as 
possible  in  coarsening,  since  this  yields  the  best  convergence 
for  both  the  usual  problem  and  the  simplified  problem.  To  answer 
this  question,  we  only  need  to  compute  the  two-level  convergence 
factors  associated  with  the  change  of  discretizations. 

Let  the  discrete  variables  of  the  usual  problem  be  denoted 
by  Ujj  =  u(ih,jh)  and  v^j  =  v(ih,jh)  and  of  the  simplified 
problem  by  u^  =  u(ih,e),  ub(ih,0)  and  v^  =  v(ih,0).  Then,  letting 
n  =  e/h,  interpolation  from  the  simplified  problem  to  the  usual 
problem  is  given  by 
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uij  *  u£  +  &  uf 
vij  -  vf 

The  coarse  grid  problem  is  then  defined  using  the  usual  Galerkin 
formulation.  Here,  we  are  concerned  with  the  stage  at  which  this 
change  of  problems  occurs.  Computed  observed  asymptotic  two-level 
convergence  factors  for  various  values  of  h  are  given  in  Table  2. 


TABLE  2.  Two-level  factors  for  switching  to  the  simplified 
problem. 


h 

P 

€ 

.13 

e/2 

.11 

e/3 

.26 

e/4 

.43 

e/5 

.57 

Clearly,  when  the  domain  is  wider  than  3h,  the  simplified 
discretization  does  not  provide  a  good  approximation  to  the  finer 
grid  problem,  and  thus  the  full  cycle  would  not  be  effective. 
However,  when  h  1  1/3,  the  simplified  problem  can  be  used  to 
obtain  a  very  good  coarse  grid  correction. 

This  provides  the  link  which  produces  the  composite  method. 
Figure  2  illustrates  graphically  how  the  coarsening  can  proceed, 
where  the  change  in  types  of  discretizations  corresponds  to 
switching  from  the  left  column  to  the  right  column  using  the 
interpolation  defined  above  and  the  introduction  of  P. 

Let  hs  denote  the  meshsize  of  the  grid  when  the  simplified 
discretization  and  P  are  introduced.  Tables  3a  and  3b  show 
observed  asymptotic  F-cycle  convergence  factors  for  various 
domain  sizes  and  fine  g^id  mesh  sizes.  Table  3a  gives  results  for 
hs  =  s,  and  Table  3b  lists  the  results  for  the  same  problems  with 
hs  =  e/?.  For  h^  small,  results  in  both  tables  are  nearly 
identical.  For  h*  large,  though,  it  is  clear  that  switching 
earlier  gives  better  results.  The  reason  for  this  is  that,  in 
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either  case,  the  F-cycle  factors  are  nearly  the  same  as  the 
two-level  factors  obtained  with  the  two  finest  grids.  (Of  course, 
the  F-cycle  rate  should  not  be  better.)  When  h*  *  e/2,  the 
two-level  rates  with  the  usual  discretization  are  around  .35, 
while  the  new  discretization  gives  much  a  much  better  two-level 
rate,  as  seen  in  Table  2  above.  It  turns  out  that  the  amount  of 
work  involved  in  either  case  is  about  the  same,  so  the  method  of 
choice  is  to  switch  problems  at  hs  =  e/2.  Clearly,  these  results 
indicate  a  vast  improvement  over  the  standard  multigrid  approach. 

FIG.  2.  Options  for  switching  to  the  new  discretization. 
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TABLE  3a.  Discretization  switched  at  h8  =  e. 


1 

h* 

2-1 

2~2 

2"3 

2'4 

2-5 

6/2 

.32 

.34 

.34 

.35 

.36 

6/4 

.28 

.31 

.29 

.31 

.31 

6/8 

.21 

.22 

.22 

.22 

.22 

e/16 

.21 

.22 

.22 

* 

* 

TABLE  3b.  Switch  problems  at  hs  =  e/2. 


€ 

hf 

2-1 

2-2 

2"3 

2-4 

2-5 

e/2 

.10 

.11 

.11 

.11 

.12 

6/4 

.22 

.22 

.22 

.23 

.23 

6/8 

.21 

.22 

.22 

.22 

.22 

6/16 

.21 

.22 

.22 

* 

* 

6.  EXTENSION  TO  3-D  PROBLEMS 

The  method  presented  here  has  not  yet  been  implemented  in 
3-D  elasticity  problems,  but  an  analysis  similar  to  that  in 
Section  3  and  an  examination  of  the  actual  discrete  equations 
indicates  that  it  could  be  done.  The  3-D  case  is,  of  course, 
more  complicated:  instead  of  one  auxiliary  function,  two  are 
introduced  in  the  case  of  a  plate,  and  more  (possibly  as  many  as 
4)  in  a  beam;  relaxation  is  also  more  complex  because  a 
distributed  Gauss-Seidel  scheme  is  necessary  for  effective 
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smoothing  of  the  transformed  problem.  There  are  indications  that 
a  similar  approach  can  also  be  used  in  problems  in  truss 
structures,  although  the  auxiliary  functions  there  will  not 
necessarily  represent  shear  stresses. 
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An  implicit  method  for  the  steady  state  solution  of  the  thin-layer  Navier-Stokes 
equations  is  presented.  The  method  is  an  extension  of  a  scheme  described  earlier  to 
curvilinear,  body-fitted  coordinate  systems.  A  fast  rate  of  convergence  is  obtained 
by  using  the  multigrid  concept  in  form  of  the  Full  Approximation  Storage  (FAS) 
scheme  in  the  algorithm.  The  computational  results  for  the  interaction  of  an 
oblique  shock  wave  with  a  boundary  layer  on  a  flat  plate  and  subsonic  and 
supersonic  NACA  0012  airfoil  flows  show  the  accuracy  and  the  efficiency  of  the 
method. 


Introduction 


From  the  very  beginning  the  application  of  the  multigrid  concept  showed  a 
dramatic  improvement  of  the  rate  of  convergence  for  the  solution  of  elliptic 
differential  equations.  Encouraged  by  this  success  the  attempt  was  made  to  achieve 
the  same  increase  in  the  rate  of  convergence  for  the  computation  of  problems 
described  by  hyperbolic  differential  equations.  The  investigations  showed  that  there 
exist  methods  of  solution  which  are  well  suited  for  the  application  of  the  multigrid 
technique.  For  example  for  the  steady  state  solution  of  the  Euler  equations  the 
explicit  multi-stage  Runge  Kutta  scheme  of  Jameson  et  al.  /I/  has  proven  to  be  an 
appropriate  basic  integration  scheme  for  the  multigrid  technique.  This  method 
contains  the  important  possibility  to  choose  the  stage  coefficients  that  way  that 
the  damping  of  the  high  frequency  modes  can  be  maximized  yielding  a  minimum 
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computational  effort  on  every  grid  level.  Its  efficiency  has  been  demonstrated  for  a 
large  number  of  subsonic  and  transonic  airfoil  flow  problems  / 2/.  Other  promising 
algorithms  are  the  methods  of  Hemker,  Spekreijse  /3/  and  Mulder  /4/  which  base  on 
upwind  discretizations.  In  these  methods  different  Riemann  solvers  are  applied. 
Mulder  uses  the  flux-splitting  method  of  van  Leer  /5/f  Hemker,  Spekreijse  employ 
the  Osher  scheme  /6/.  The  large  system  of  equations  of  these  implicit  approxi¬ 
mations  are  iteratively  solved  by  relaxation  methods.  Since  these  schemes  are 
sufficiently  dissipative  the  multigrid  technique  can  efficiently  be  applied  to  reduce 
the  computational  work  in  the  iteration  process.  Hemker,  Spekreijse  use  nonlinear 
multigrid  whereas  Mulder  employs  linear  multigrid.  A  study  of  both  linear  and 
nonlinear  multigrid  is  given  by  Jespersen  111. 

Little  experience  exists  with  the  application  of  the  multigrid  concept  for  the 
steady  state  solution  of  the  Navier-Stokes  equations  of  a  compressible  fluid.  One  of 
the  first  schemes  was  developed  by  Chima,  Johnson  /8/.  They  combinded  the 
multigrid  strategy  with  the  explicit  MacCormack  method.  In  /9/  Shaw,  Wesseling 
present  a  multigrid  Navier-Stokes  code  which  is  quite  similar  to  the  multigrid  Euler 
method  by  Hemker,  Spekreijse  /3/.The  results  obtained  for  different  arc  airfoil 
flow  problems  show  the  effect  of  the  viscous  terms  on  the  convergence  behaviour. 
In  contrast  to  their  Euler  solutions  no  grid-independent  rate  of  convergence  could 
be  achieved  for  the  Navier-Stokes  solutions.  The  multigrid  Navier-Stokes  algorithm 
of  Schroder,  Hanel  / 10,  11/  is  an  extension  of  the  upwind  methods  for  the  solution 
of  the  Euler  equation  developed  by  Jespersen  /7/  and  Mulder  /4/,  respectively.  Its 
efficiency  in  comparison  to  other  multigrid  and  single  grid  methods  has  been 
presented  for  several  test  problems  in  /10/. 

The  main  purpose  of  this  paper  is  to  demonstrate  the  application  of  the  multigrid- 
relaxation  scheme  proposed  in  /10/  to  more  complex  laminar  viscous  flow  problems. 
The  results  of  the  interaction  of  an  oblique  shock  wave  with  a  boundary  layer  on  a 
flat  plate  and  the  subsonic  and  supersonic  flow  over  an  NACA  0012  airfoil  show 
the  accuracy  and  the  convergence  behaviour  of  the  method. 

The  Governing  Equations 

Let  9  denote  the  density,  u,  v  the  Cartesian  velocity  components,  p  the  pressure, 
|i,K,Ythe  viscosity,  the  heat  conduction  coefficient  and  the  ratio  of  specific 
heats.  Then,  neglecting  body  forces  and  heat  sources,  the  conservative,  non- 
dimensional  form  of  the  two-dimensional  thin-layer  approximation  of  the  Navier- 
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Stokes  equations  of  a  compressible  fluid  can  be  written  for  a  generalized  -  body- 
fitted  -coordinate  system  |  =  5  (x,y),  n=  ’l  (x»y)  /12/ 


1 9  \ 

(9 0  \ 

|  9U  ' 

1  A 

1  1  P- 

9u0 

v:i 

J  r- 

gvG  *p5y  / 
\  0(e*pl  / 

G= 


9V 
guV  *pn 
9yv*pny 
V(e*pl  i 


J'1 


/  ° 

I  «2'V«3V, 

\Vi(«,-«<Hu2)11  ♦  vi(at3-*4){v2)n  ♦  »2{uv)n»  aje/g)^ 


J*1 


(1) 


with 


J  = 

xtMyJ*1  y^-ix-i 

0  =  u£*«v5y 

*1  =  p(Vni*2*n2> 

«3  =  p(rix2*VST]y) 


Xii="5yJ  yn=5xJ 

V  =UTJx*vry 


«2=n/3n*ny 

*4=^-tnx2+ny2> 


and 

p  =  (f-1)(e-0.5g(u2»v2)) 

The  quantity  J  is  the  Jacobian  of  the  transformation,  Q  represents  the  vector  of 
the  conservative  variables,  F,  G  are  the  Euler  fluxes  consisting  of  the  convective 
and  the  pressure  terms  and  S  contains  the  transformed  shear  stresses  and  heat  flux 
terms.  The  Reynolds  number  Re„  =  p„u..  L/p„,and  the  Prandtl  number 
Pr„,  =  UcCp/k*,  are  defined  by  the  reference  values.  The  viscosity  and  the  heat 
conduction  coefficient  are  evaluated  from  Sutherland's  power  law.  The  adiabatic 
no-slip  condition  is  imposed  on  solid  wails  and  undisturbed  flow  conditions  are 
assumed  for  the  far  field. 


Flux-Splitting  in  Curvilinear  Coordinate  Systems 

In  order  to  use  upwind  discretization  to  the  hyperbolic  part  of  Eq.  (1)  the  Euler 
fluxes  F,  G  have  to  be  split  in  a  forward-flux  vector  F+,  6+  and  a  backward-flux 

A  _  A  a 

vector  F  ,  G  .  In  this  investigation  the  splitting  concept  of  van  Leer  is  employed. 
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Fig.  1  Connection  between  the  Cartesian  velocity  com¬ 
ponents  u,  v  and  the  locally  Cartesian  velocity 
components,  u,  v. 


To  apply  van  Leer's  Cartesian  splitting  formulation  to  a  curvilinear  coordinate 
system  the  components  of  the  transformed  Euler  fluxes  have  to  be  reformulated  in 
such  a  way  that  every  term  of  F,  6  belongs  to  a  locally  Cartesian  grid.  Then  the 
flux  can  be  split  according  to  the  procedure  given  by  van  Leer. 

Consider,  for  example,  the  q  -momentum  component  of  f 

Fj*  J'MgvO  ♦  p$y!  (2) 

A 

Rewritten  for  a  Cartesian  grid  Fj  reads 

F3=J'1((9uu»^-lSy*9Qv5J  =  J_1 1  f5y*gS*l  (3) 


where  u,  v  are  locally  Cartesian  velocity  components.  Fig.  1.  The  terms  f  and  g  are 
split  as  in  a  Cartesian  system.  Consequently,  the  split  components  F^- 


F3i  =  J-,?f(vl«l-i|rt^.5y] 


(4) 
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1/2 

and  the  speed  of  sound  a  =  ( i  p/9  )  .  Having  computed  all  forward  and  backward 

flux  components  F-,  6^  can  be  written  for  subsonic  flow  I  Ml  <  1  in  the  following 
form 


A  ♦  1 

P'=  J'1 


Ivmlp,1 

a4  r  , _ 1  Wrn*  A  2a  i 

PT(u|vm|-  — 

Pf  *  |rT 1  vm|  +  2|y»l)lvml  H*'1  l1*'  1{vm*-umr 


(5) 


with 

pi  =  t  gal  Mil  l2/4 
and 

M  =  W/alvm| 

W  =um,«  vmr 

Substituting  m  =  |  one  determines  F-  =  P-  with  f  y  =  py  and  U  =  W,  and  for  m  =  ri 
one  obtains  G-  =  P-  withgy  =  py  and  V  =  W.  In  the  case  of  M  J  1 

p^lFm^GmyiJ*1  ,P'=0 
and  for  M  5  -1 
P*=0  ,  P'sIFrrv+Gmy)  J'1 

where  F,  G  are  the  Euler  fluxes  in  a  Cartesian  coordinate  system.  Using  the  split 
fluxes  F-,  G-  the  thin-layer  equations  can  be  reformulated 

Qt*F4%F4%G**6;-RCS=°  (6) 

Eq.  (6)  is  the  basis  for  the  numerical  method. 

Method  of  Solution 
Discretization 


An  implicit  finite  difference  method  is  used  to  solve  the  split  thin-layer  equations 
(6)  .  The  time  derivative  is  approximated  first-order  accurate  0 ( A  t)  by  the 
backward  Euler  scheme.  The  steady-state  operator  is  discretized  by  a  spatially 
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conservative  approximation.  The  dependent  variables  and  the  coordinates  are 
defined  at  the  nodal  points  while  the  cell  boundaries  are  determined  by  averaging 
over  the  neighbouring  grid  points,  Fig.  2. 

The  time-linearized  difference  equation  of  Eq.  (6)  reads 

l^E*^**^  *6^**6^'  -Re^efAQ"  =  -ResQn  (7) 

with 

Res  Q"  =  [  6?F%  ♦  6,0%  6£'  -  R^S  1°  i  7o ) 

The  difference  terms  are  defined  by  (Fig.  2) 

-  fi.V2,j  -  fi-V2.J 

V  =  fi.H/2  *fi,j-1/2 


where  the  subscripts  i  +  l/2,j,  i,j  +  1/2  denote  the  cell  interfaces  (Fig.  2),  the 


superscript  n  characterizes  the  time  level,  6  ^  represents  the  reciprocal  of  the 

n  A  n 

time  step  A  t  =  t  -  t  ,  A  Q  corresponds  to  the  forward  difference 
AQ  =Q  -Q,Eis  the  identity  matrix  and  A-,  B-,  C  are  the  Jacobians  of  the 
fluxes  F-,  G-,  S. 


The  terms  describing  the  shear  stresses  and  the  heat  flux  are  approximated  by 
central  differences  second-order  accurate.  The  discretization  of  the  convective  and 


the  pressure  terms  follows  Godunov's  approach  /1 3/,  i.e.  at  each  interface 


ri+l/2,j’  f  i,j+l/2 


(Fig.  2)  the  Euler  fluxes  are  determined  by  the  solution  of  a 


Riemann  problem  which  is  provided  by  the  flux  splitting.  The  MUSCL-type 
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differencing  is  used  /14/  to  approximate  the  split  Euler  terms.  That  is  in  the  first 
step  the  values  of  the  vector  of  the  conservative  variables  of  the  nodal  points  are 
one-dimensionally  extrapolated  to  the  cell  interfaces  ^ -l+j_/o  •>  j  +i/2> 


QU.r  Qi.i*«p6Qi.j » (-Q|;p>, 

Vl.V2.  f  Qi*i.i  *  VfiQnij  «  Qi.,j  -  »( A-lj.,  (AQ|-f'S  ) 


s*s 


s*s 


(8a, b) 


with 


AQ j  =  Qj.ij  -  Q jj 

=®i.j  “ Oi-1.1 


Sj  =  (( Xj.tj  -  x; j )2  ♦  (y;.,  j  -  y,  j  :J)1/2 

-yi-t.j  >2)Vj 


Similar  expressions  can  be  derived  for  .  Qi.  .  /0.  In  the  second  step  the 

1-1/2, J,+  i,j+l/2 

Euler  fluxes  (Eq.  (5))  which  depend  on  j'  ^Fj  +  1/2  anc*  on  t^ie  9eometry  °f 

cell,  are  evaluated  by 


F’i*1/2,j  *  F*  ^  Q-i±  1/2,  j  •  rit1/2,j  ) 

A  4  A  4  4 

G'i.jti/2  =  G*  (Q'jj.  va  ,  fj  j.,/2 ) 


(9a, b) 


Thus,  if  the  quantity  ip  is  zero  the  approximation  of  the  convective  and  of  the 
pressure  terms  is  first-order  accurate,  if  <P  is  one  a  spatially  second-order  accurate 
difference  scheme  is  obtained. 


In  the  case  of  the  second-order  approach  (ip  =  1)  over-  and  undershoots  occur  near 
points  of  extrema  or  discontinuities.  To  avoid  this  problem  the  interpolation  of  the 
vector  of  the  conservative  variables  has  to  satisfy  the  condition  of  monotonicity, 
e.g.  Qj  j  $  1/2  ■ s  Qj  +  ^  j*  This  can  be  achieved  by  limiting  the  higher-order 

correction  term  5Q  by  a  switching  function.  The  task  of  the  switching  function 
l.  .  is  to  eliminate  oscillations  in  regions  with  strong  gradients  .-—0,  that  means 
a  local  reduction  to  a  first-order  scheme,  whereas  •  should  be  one  in  regions 
with  smooth  solutions. 


Several  switching  functions  have  been  developed  in  recent  years.  Above  all,  they 
are  constructed  under  the  condition  to  yield  sharp  discontinuities  in  solutions  for 
the  Euler  equations.  Other  aspects  have  to  be  observed  for  the  choice  of  switching 
functions  in  the  computation  of  viscous  flows  at  high  Reynolds  numbers.  In  these 
flows  thin  viscous  layers  occur  at  solid  walls  in  which  the  distribution  of  the 
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Fig.  3  Switching  function  versus  r  =  AQ/ 

Roe:  1  van  Leer:  van  Aibada:  l  y^. 


conservative  variables  is  indeed  smooth  but  shows  a  strong  curvature;  e.g.  the 

profile  of  the  velocity  component  tangential  the  wall.  If  the  reaction  of  the  switch 

factor  upon  the  curvature  is  too  strong  the  numerical  dissipationincreases  and  e.g. 

an  unphysical  skin-friction  coefficient  will  result.  Thus,  the  switch  factors  should 

only  slightly  deviate  from  l  .  .  =  1  when  smooth  variations  in  AQ.,  A  Q.  occur. 

*ij  1  1 


For  an  equidistant  Cartesian  coordinate  system  the  behaviour  of  the  switching 
functions  of  Roe  / 15/  =  1  -  Ax/2  (In  (Qx))x,  van  Leer  /16/  £y.= 

1-  A  x2/4  (In  (Qx))x  and  van  Aibada  et  al.  /II /  =  1  -  Ax2/2  (ln(Qx))x  was 

investigated  in  regions  with  moderate  differences  in  AQ,  AQ,  Fig.  3. 


Roe's  switch  factor  shows  the  strongest  reactions  on  deviations  of  r  from  one. 
Accordingly,  l  ^  gives  a  more  dissipative  scheme  in  the  boundary  layer  than 
£  Vl_>  ^  \//\*  That  is  the  switch  factors  of  van  Leer  and  van  Aibada  seem  to  be  more 
suited  for  the  solution  of  the  thin-layer  equations.  Computations  with  the  switching 
function  which  is  in  contrast  to  £y^  a  non-differentiable  switch  factor  show 
a  stagnation  in  the  rate  of  convergence  if  the  time  step  is  much  larger  than  that 
given  by  the  CFL  condition.  For  this  reason  the  switch  factor  of  van  Aibada  et  al. 
is  used  in  this  study.  Written  in  a  curvilinear  coordinate  system  one  obtains 

2AQS  AQS  *t  (101 

?VA"  (AQ  S)2  ♦  (A*QS)2 

where  the  quantity  e  is  a  small  number  to  prevent  division  by  zero.  Consequently, 


Schroder  and  Hanel 


563 


the  complete  limited  expressions  for  the  extrapolated  conservative  variables 
(Eq.  (8),  Dread 


Qi‘1/2,j  =Qi.J 


♦  ( 


J  \  I 

rr'1  te 


2 AQS AQS+  c 


1  i.1/2,j 


=  Q 


(AO  S)2*  (AOS)2 
1  ,  ,  2AQSAQS*c 

Ti.i 


+  E 


), 


, AQ 5*  A~QS ' 

1  §♦§  lj 


(AQS)2*(AQSr 


,  ,  AQS-fAQ?' 

'  U1  1  §♦§  *i*1 


(lla.bl 


The  implicit  part  of  Eq.  (7)  is  formally  approximated  in  the  same  way  as  described 
for  the  right-hand  side  of  Eq.  (7).  Since  the  accuracy  of  Res  Q  is  not  influenced  by 
that  of  the  implicit  part,  the  left-hand  side  of  Eq.  (7)  is  discretized  first-order 
accurate  ( =  0). 

Thus,  the  difference  scheme  is  spatially  second-order  accurate  only  in  the  steady 
state.  Nevertheless,  one  has  to  keep  in  mind  that  this  is  only  correct  in  regions 
where  the  solution  is  smooth  (  D.  Near  extrema  ( 5  y^-*~0)  the  difference 

approach  is  locally  first-order  accurate. 


Inversion  of  the  Solution  Matrix 


To  solve  the  difference  scheme  of  the  thin-layer  equations  (7)  it  is  necessary  to 
invert  a  block-pentadiagonal  coefficient  matrix.  Widely  used  is  the  approximate 
factorization  method  of  Beam  and  Warming  /18/.  In  this  investigation  the  inversion 
of  the  solution  matrix  is  carried  out  iteratively  by  a  relaxation  method.  Since  the 
matrix  is  diagonally  dominant  due  to  the  upwind  approach  very  large  time  steps  can 
be  employed  to  determine  the  steady  state  solution  of  Eq.  (7).  In  contrast  to  the 
factored  scheme  of  Beam  and  Warming  the  rate  of  convergence  of  the  relaxation 
methods  is  less  sensitive  to  the  size  of  the  time  step.  Furthermore,  for  very  large 
time  steps  the  relaxation  schemes  can  under  certain  conditions  change  into 
Newton's  method  leading  to  a  quadratic  convergence  for  the  residual  (Eq.  (7a)). 

The  iterative  procedure  between  two  time  levels  reads 

( 6,  E  ♦  6^  ♦  6^  ♦  6^6%  6^'  -  Rei  fi^C  f  AQV  =  -  Res  Qn  (12) 

or  in  an  abbreviated  notation 

l_nAQv  =  fn  (12a) 

where  the  superscript  v  denotes  the  iteration  index  and  AQ  =  Qn+^,v  -Qn.  The 
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relaxation  method  used  for  the  iteration  of  Eq.  (12)  is  either  a  collective  point 
GauG-Seidel  scheme  in  alternating  directions  (PAD),  one  step  reads 

(LnAQ)vij  +  (LnAQ)*.1.j*(LftAQ)u1,J  +  (LnAQ)’j.1»(LnAQ)-j«1  =  f"j  (13a) 


or  a  collective  line  GauB-Seidel  scheme  in  alternating  directions  (LADI),  one  step 
reads 


{  Lb AQ  )Wjj  ♦  (Ln AQ)".,  j  ♦  (lT AQ  ♦  (lT AQ  )*j.t  ♦  (lT  AQ)"j\,  =  f£, 


(13b) 


Since  there  is  no  time  step  restriction  At  is  increased  with  decreasing  residual 

yielding  very  large  time  steps  in  the  nearly  converged  state.  Using  this  form 

scheme  (12)  becomes  a  Switched  Evolution/Relaxation  (SER)  scheme  /19/.  If  the 

condition  max  [  I A  Q v  -AQV-^I  /  IAQVI].  .  s  0  with  0  as  a  small  number  is 

*»] 

satisfied  the  iteration  of  Eq.  (12)  will  be  terminated. 


The  iterative  solution  of  the  discrete  elliptic  problem  in  each  time  step  (Eq.  (12))  is 
very  costly.  Therefore  the  multigrid  concept  is  applied  to  accelerate  the  inversion 
process.  Although  we  have  to  solve  a  linear  system  (Eq.(12))  the  Full  Approximation 
Storage  (FAS)  scheme  as  proposed  by  Brandt  /20/  is  used.  For  completeness  it  will 
be  described  briefly. 


On  the  finest  grid  Gm  equation  (12a)  can  be  written 

LmAQ;=fn  (14  a) 


where  the  superscript  n  is  dropped  for  clearness.  According  to  Brandt  the  corrected 
difference  equation  on  coarser  grids  m  >  c  >  1  reads 

Lm-cAQ;.c=fm-c  (14W 

with 

fm-c  =  rm.c  ♦  lm.c  (CL,  AQ'U., )  (14  c) 

and  the  restricted  residual  of  the  fine  qrid  G  , 

3  m-c+1 

rm-c  =  n«i-e*1  (fm-c»,  ■  bm-c»1  AQm_c*i  I  (14  d) 

After  some  relaxation  sweeps  on  every  grid  level  the  coarsest  grid  G^  is  reached. 
Its  difference  equation 

mo1;  =  f, 


(14  e) 
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is  computed  with  the  same  relaxation  method  as  used  on  the  finer  grids. 
Subsequently,  the  correction 


COR  =  AQvm.c 


-Im-cl  AQm-c.1 


(14  f) 


is  bilinearly  interpolated  to  the  finer  grid  G»m_c+i 

aq;.c,  « o¥m-c  ♦cr'tcoR)  (14  g) 


followed  by  one  relaxation  sweep  on  every  grid  level.  The  V-cycle  is  completed 

when  the  finest  qrid  G  is  reached. 

3  m 

For  the  construction  of  the  coarse  grids  every  second  fine  grid  point  is  delected.  To 

m-c  m-c 

restrict  the  variables  and  the  residuals  full  weighting  operators  I  c+j,  II  c+^ 

are  used.  Their  elements  are  determined  by  the  portion  of  the  fine  grid  volume 

which  also  belongs  to  the  coarse  grid  volume  /ll/.  In  the  coarse  grid  operator  l_m 

(m>c>l)the  Jacobians  A-  ,  B-,  C  are  evaluated  using  the  restricted  variables 

G)  =  Im-C  ,  Q_  ,  and  the  geometric  terms  T  0f  the  coarse  grid  cell, 
m-c  m-c+1  m-c+1 

In  all  computations  the  boundary  conditions  were  employed  explicitly  with  respect 
to  the  iteration  level.  During  the  multigrid  cycle  the  boundary  values  are  only 
renewed  after  each  relaxation  sweep  on  the  finest  grid.  On  the  coarser  meshes  they 
remain  unchanged.  That  means  that  for  a  converged  iteration  the  vector  of  the 
conservative  variables  is  on  the  entire  computational  domain  (including  the  bound¬ 
aries)  on  the  same,  new  time  level. 


Results 


The  improved  convergence  behaviour  of  the  multigrid-relaxation  scheme  presented 
above  was  already  demonstrated  in  comparison  to  other  explicit  multigrid  and 
implicit  single  grid  methods  / 10/.  In  the  following,  only  the  rates  of  convergence  of 
that  multigrid-relaxation  procedure  are  shown.  The  laminar  flow  problems 
computed  are  the  interaction  of  an  oblique  shock  wave  with  a  boundary  layer  on  a 
flat  plate  and  subsonic  and  supersonic  flows  over  an  NACA  0012  airfoil. 

Interaction  of  an  Oblique  Shock  Wave  with  a  Boundary  Layer  on  a  Flat  Plate 

First,  the  shock  wave  boundary  layer  interaction  is  investigated.  This  flow  problem 
has  experimentally  been  studied  in  detail  by  Hakkinen  et  al.  /21/.  Its  geometry  is 
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Vyp/o\ypy=o 


leading  edge 


Fig.  4  Computational  domain  and  boundary  conditions  for  shock 
boundary  layer  interaction  computations. 


indeed  very  simple  but  it  contains  most  of  the  difficulties  of  a  viscous  supersonic 
flow,  Fig.  4.  The  pressure  rise  over  the  shock  discontinuity  gives  a  (bubble-like) 
thickening  of  the  boundary  layer  near  the  shock  impingement  point.  The  expansion 
of  the  flow  above  the  boundary  layer  is  followed  by  a  compression,  caused  by  the 
concave  distribution  of  the  boundary  layer  thickness,  generating  the  reflected 
shock  wave.  Depending  on  the  strength  of  the  shock  wave  a  separation  bubble 
occurs. 

The  computational  domain  and  the  boundary  conditions  are  indicated  in  Fig.  4.  The 

unit  reference  length  corresponds  to  the  distance  between  the  leading  edge  and  the 

geometric  shock  impingement  point  on  the  plate.  Near  that  point  the  mesh  is 

refined  in  the  flow  direction,  Fig.  5.  The  minimum  step  size  in  the  normal  direction 
-4 

was  10  .  Over  the  plate  a  geometrical  stretching  is  used  for  the  step  size  in  x-  and 
y-direction. 

A  uniform  flow  field  is  assumed  as  initial  distribution.  At  the  upper  boundary 
downstream  from  the  shock  wave,  however,  the  variables  are  prescribed  in 
accordance  with  the  Rankine-Hugoniot  conditions.  After  having  performed  the  first 
time  step  of  the  iteration  process  normal  derivative  conditions  are  given  along  the 
upper  boundary  downstream  of  the  geometric  shock  impingement  point  (Fig.  4), 
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Fig.  5  Typical  grid  for  shock  boundary  layer 
interaction  computations. 


because  it  is  not  a  priori  known  whether  the  reflected  shock  leaves  the 
computational  domain  at  the  downstream  boundary  or  the  upper  boundary. 

The  multigrid  relaxation  method  contains  a  collective  point  GauQ-Seidel  scheme  in 
alternating  directions  (PAD).  The  V-cycle  consists  of  three  grid  levels  with  a  finest 
grid  of  33x33  grid  points.  Two  relaxation  sweeps  are  performed  before  the 
restriction,  one  relaxation  sweep  on  the  coarsest  grid  and  after  the  prolongation. 
Only  one  V-cycle  is  used  in  each  time  step. 

The  computational  results,  i.e.  the  trace  of  maximum  residual  as  function  of  the 
time  steps  and  the  skin-friction  coefficient  c^.  versus  the  local  Reynolds  number 
Re(x),  for  a  flow  without  (Ma„=2.,  Re«  =  2.84  x  10^,  0  =  31.347°)  and  with 
(Ma„  =  2.,  Re„  =  2.96  *  10^,  0  =  32.585°)  boundary  layer  separation  are  presented  in 
Fig.  6  and  in  Fig.  7.  It  is  very  interesting  to  see  that  in  the  case  of  the  separated 
flow  the  rate  of  convergence  remains  nearly  the  same  as  for  the  unseparated  flow, 
Fig.  6a,  7a.  In  both  cases  about  100  time  steps  are  necessary  to  reduce  the 
maximum  residual  to  10  •  The  comparison  of  the  computed  skin-friction 
coefficient  distributions  with  the  experimental  data  of  Hakkinen  shows  an  excellent 
agreement  for  the  flow  without  separation,  Fig.  6b.  For  the  other  flow  problem  the 
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Fig.  6a  Computational  results  for 
shock  boundary  layer  inter¬ 
action  (without  separation). 
Maximum  residual  versus 
time  step. 


.  IQ*2 


Fig.  6b  Computational  results  for  shock  boundary 

layer  interaction  (without  separation).  Skin- 
friction  coefficient  versus  local  Reynolds 
number. 


Fig.  7a  Computational  results  for 
shock  boundary  layer  inter¬ 
action  (with  separation). 
Maximum  residual  versus 
time  step. 


0.8  16  24  32  «V(x)4.8„o5 

Fig.  7b  Computational  results  for  shock  boundary 
layer  interaction  (with  separation).  Skin- 
friction  coefficient  versus  local  Reynolds 
number. 
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separation  bubble  as  computed  is  slightly  too  small,  Fig.  7b.  In  order  to  improve  the 
computational  results  for  the  separated  case  a  finer  mesh  would  have  to  be  used  as 
was  demonstrated,  for  instance,  in  /18/. 

At  first  sight  the  number  of  time  steps  seems  large  compared  to  the  rate  of 
convergence  obtained  by  Shaw  and  Wesseling  /9/.  A  direct  comparison  of  the 
convergence  behaviour  of  both  methods  is,  however,  questionable  because  the 
computed  flow  problems  and  also  the  used  grids  differ  very  much.  Moreover,  in  / 9/ 
the  free-stream  Reynolds  number  is  not  given.  But  it  is  especially  this  parameter 
which  determines  the  influence  of  the  second  order  derivatives  on  the  flow  field, 
e.g.  the  boundary  layer  thickness,  and  also  their  effect  on  the  smoothing. 

Row  over  an  NACA  0012  Airfoil 

Next,  a  subsonic  and  a  supersonic  flow  over  an  NACA  0012  airfoil  is  computed.  In 
the  subsonic  case  the  free-stream  Mach  number  is  Ma«  =0.5,  the  free-stream 
Reynolds  number  related  to  the  chordlength  is  Re«  =  lO^1  and  the  angle  of  attack 
is  a=  5°,  for  the  supersonic  problem  the  quantities  Ma„  =  1.5,  Re„  =  105,  a  =  0° 
are  valid. 


Uniform  flow  is  prescribed  as  initial  distribution.  At  the  body  surface  the  adiabatic 
no-slip  condition  is  imposed.  The  wall  pressure  p^  is  evaluated  by  the  normal 
momentum  equation.  The  farfield  conditions  are  determined  in  accordance  to  the 
linearized  characteristics  of  the  Euler  equations  of  a  locally  one-dimensional  flow. 
More  details  concerning  the  initial  distribution  and  the  boundary  conditions  are 
given  in  /ll/. 


An  algebraically  generated  C-mesh  as  shown  in  Fig.  8  with  169  points  in  5  - 
direction  (129  points  around  the  airfoil,  40  points  along  the  wake)  and  49  points 
in  4  -direction  on  the  finest  grid  level  is  used.  The  minimum  step  size  normal  to  the 
surface  is  10  The  outer  boundary  is  located  10  chordlengths  from  the  airfoil. 


In  order  to  obtain  an  efficient  multigrid  method  for  the  inversion  process  of  the 
solution  matrix  the  point  GauO-Seidel  relaxation  scheme  (PAD)  has  to  be  replaced 
by  a  line  GauO-Seidel  relaxation  scheme  in  alternating  directions  (LADI).  The 
effect  of  this  modification  on  the  convergence  behaviour  of  the  multigrid 
procedure  is  demonstrated  by  a  comparison  of  the  residual  reduction  factors 


,  p  IIRESIQlX  ,1/p-i 
1  =2 II RES(Q)v'1IIcd 


(15) 
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Fig.  8  Typical  grid  for  NACA  0012  airfoil  compu¬ 
tations. 


computed  for  every  component  of  the  solution  vector.  The  quantity  p  is  the  number 
of  iterations,  the  symbol  II  •  II  represents  the  maximum  norm  and  RES(Q)  =  f  - 
L  AQ.  For  two  NACA  0012  airfoil  flows 

(a)  subsonic:  Ma„  =0.5,  Re  ..  =  10^,  a  =  0° 

(b)  supersonic:  Ma„  =  1.5,  Re  ■*,  =  10^,  a  =  0° 

the  factors  fl(Q)  of  a  PAD-  and  a  LADI-smoothing  scheme  used  in  the  multigrid 
method  are  determined.  In  every  time  step  five  V-cycles  consisting  of  three  grid 
levels  are  performed.  After  the  strong  residual  reduction  in  the  beginning  of  the 
computation  averaged  factors  fl  (Q) 

Q(Q)  =  —  E  Q.1QI  <16) 

n  i*i  ■ 

are  evaluated  by  means  of  the  flj(Q)  of  n  =  10  time  steps.  The  results  are  compiled 
in  the  following  tables. 
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Table  1:  Averaged  residual  reduction  factors  of  the  continuity  (ft  (  9  )),  the 

momentum  (Q(  9u),  ft(  9  v))  and  the  energy  equation  (q  (e))  ^Qr 

two  flow  problems  (a),  (b);  relaxation  scheme:  PAD 


PAD 

ft  (  9  ) 

ft(  9  u) 

ft  (9  v) 

ft  (e) 

(a) 

0.552 

0.815 

0.513 

0.585 

(b) 

0.317 

0.959 

0.298 

0.647 

Table  2:  Same  notations  as  in  table  1;  relaxation  scheme:  LADI 


LADI 

ft  ( 9  ) 

ft  (9  u) 

ft  (  9  v) 

ft  (e) 

(a) 

0.386 

0.386 

0.324 

0.386 

(b) 

0.709 

0.719 

0.698 

0.727 

For  the  PAD-relaxation  scheme  (Table  1)  the  factors  ft(Q)  of  the  single  equations 
differ  very  strongly  from  each  other  which  is  traced  back  to  an  insufficient 
coupling  of  the  system.  The  rate  of  convergence  is  hampered  by  the  worst  residual 
reduction  factor.  The  factors  of  the  LADI-relaxation  scheme  (Table  2),  however, 
show  only  slight  differences.  An  almost  uniform  smoothing  of  the  single  equations 
is  obtained  which  results  in  a  better  convergence  behaviour.  For  the  supersonic 
flow  problem  the  residual  reduction  diminishes  drastically.  In  the  direct  proximity 
of  the  shock  discontinuity  the  distribution  of  the  residuals  shows  strong  gradients. 
In  this  region  the  residuals  are  not  sufficiently  damped  which  reduces  the 
efficiency  of  the  present  multigrid  concept.  Using  the  PAD-relaxation  scheme  no 
grid-independent  rate  of  convergence  can  be  achieved  whereas  with  the  LADI- 
relaxation  scheme  a  grid-independent  convergence  behaviour  is  obtained  at  least 
for  the  subsonic  test  problem. 

The  following  results  are  computed  with  a  multigrid  procedure  consisting  of  three 
grid  levels  and  a  collective  line  GauG-Seidel  relaxation  scheme  (LADI).  Only  in  the 
beginning  of  the  calculation  when  small  time  steps  are  present  one  V-cycle  is 
employed  in  the  inversion  process  while  for  the  most  part  of  the  computation  two 
V-cycles  are  used.  One  relaxation  sweep  is  performed  on  the  coarsest  grid,  before 
the  restriction  and  after  the  prolongation. 

For  the  subsonic  and  the  supersonic  NACA  0012  airfoil  flow  the  Mach  contours  and 
the  rates  of  convergence  for  the  steady  state  solution  are  presented  in  Fig.  9  and  in 
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Tig.  9a  Laminar  subsonic  flow  over  an  NACA  0012 
airfoil  (Ma„z  0.5,  Re»=  104,  a  =  5°). 
Mach  contours. 


Fig.  9b  Laminar  subsonic  flow  over  an  NACA 
0012  ay-foil  (Ma  „  =  0.5,  Re  „  =  10  , 
a  =  5  ).  Maximum  residual  versus 
time  step. 
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Fig.  10a  Laminar  supersonic  flow  over  .an  NACA  0012 
airfoil  (Ma„  =  1.5,  Re  „  =  10  ,  a  =  0°).  Mach 
contours. 


Fig.  10b  Laminar  supersonic  flow  over 

an  NACA^012  airfoil  (Ma„=  1.5, 
Re  „  =  10  ,  a  =  0°).  Maximum 
residual  versus  time  step. 
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Fig.  10.  The  steady  state  is  defined  to  be  reached,  if  the  distribution  of  the  skin- 
friction  coefficient  and  of  the  pressure  coefficient  does  not  vary  more  than  0.1  per 
cent,  and  the  number  of  supersonic  points  remains  constant  within  several  (about 
15)  time  steps. 

4  0\ 

The  Mach  contours  of  the  subsonic  flow  problem  (Ma„  =  0.5,  Re.,  =  10  ,  a  =  5  ) 
show  a  clear  suction  peak  on  the  upper  surface  of  the  profile,  Fig.  9a.  That  leads  to 
a  separation  at  about  62  per  cent  of  the  chordlength.  With  the  multigrid  relaxation 
scheme  less  than  100  time  steps  are  necessary  to  obtain  a  converged  solution, 
Fig.  9b. 

The  computed  flow  field  of  the  supersonic  airfoil  flow  (Ma  =  1.5, 
Re^  10  ,  a  =  0°)  shows  the  position  of  the  separated  shock  wave,  Fig.  10a.  The 
thickening  of  the  shock  discontinuity  at  a  distance  from  the  airfoil  is  caused  by  the 
larger  space  steps  in  this  region  of  the  computational  domain  (Fig.  8).  About  150 
time  steps  have  to  be  performed  to  reach  the  steady  state  solution,  Fig.  10b. 


Concluding  Remarks 

A  multigrid-relaxation  scheme  for  the  steady  state  solution  of  the  two-dimensional 
thin-layer  Navier-Stokes  equations  for  the  flow  of  a  compressible  fluid  in  a 
curvilinear  body-fitted  coordinate  system  has  been  described.  The  convective  and 
the  pressure  terms  are  discretized  with  the  flux-splitting  method  of  van  Leer. 
Second  order  MUSCL  discretization  is  used  for  the  Euler  terms.  The  terms  con¬ 
taining  the  shear  stresses  and  the  heat  flux  are  approximated  by  central 
differencing.  The  matrix  inversion  in  each  time  step  is  iteratively  performed  with  a 
collective  Switched  Evolution/Relaxation  scheme  and  accelerated  by  a  linear  FAS 
method.  The  computational  results  for  several  subsonic  and  supersonic  flow 
problems  show  the  accuracy  and  the  convergence  behaviour  of  the  scheme.  If  one 
Work  Unit  (WU)  is  defined  as  the  cost  of  one  PAD-  or  one  LADl-relaxation  on  the 
finest  grid  in  all  computations  less  than  350  WU  are  necessary  to  determine  the 
steady  state  solution.  There  is  no  doubt  that  this  is  still  too  much  work  for  the 
computation  of  a  steady  airfoil  flow.  F or  this  reason  we  investigate  the  possibility 
to  improve  the  efficiency  of  the  present  method  by  using  the  Full  Multigrid 
concept. 
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The  Simple  Pressure- Correction 
Method  as  a  Nonlinear  Smoother 

G.J.  Shaw  and  S.  Si valoganathan 
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8-11  Keble  Rd .  Oxford,  U.K. 


1.  INTRODUCTION 

The  aim  of  this  paper  is  to  analyse  the  error  smoothing 
properties  of  certain  pressure-correction  methods  by  means  of 
Fourier  analysis.  Such  an  analysis  is  justified  in  the  context  of 
a  multigrid  smoother  since  we  are  interested  only  in  the  high 
frequency  error  reduction  and  these  errors  have  a  small  domain  of 
influence.  Although  the  analysis  presented  in  this  paper  is 
concentrated  on  the  SIMPLE  algorithm  of  Patankar  and  Spalding[4], 
it  is  readily  extended  to  include  other  pressure-correction 
methods  such  as  SIMPLER  and  SIMPLESJT'C 

In  section  3  SIMPLE  is/aescribed  in  a  setting  of  general 
pressure-correction  methods.  This  makes  it  clear  exactly  what 
assumptions  are  made  and  which  terms  neglected.  As  a  consequence 
a  new  class  of  pressure-correction  methods  is  derived.  This  is 
analysed  and  found  to  give  better  smoothing  rates. 

A  multigrid  method  based  on  the  SIMPLE  algorithm  is 
described  by  Sivaloganathan  and  Shaw[6].  In  section  5  this  is 
used  to  obtain  empirical  smoothing  rates  for  comparison  with  the 
predicted  theoretical  rates.  It  is  found  that  the  practical 
behaviour  of  the  iteration  is  well  modelled  by  Fourier  analysis. 
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2.  GOVERNING  EQUATIONS  AND  DISCRETISATION 


2.1  The  Navier-Stokes  Equations 

The  Navier-Stokes  equations  for  the  steady  incompressible  flow  of 
a  Newtonian  gas  may  be  written 


0  (2.1a) 


2  r 

3pv  3puv  3ja  _  3 j  3v\ 

3y  3x  3y  3y \  ^3y/ 


0  (2.1b) 


_  a 

3x  3y  “  ° 


(2.1c) 


where  x.  y  denote  the  co-ordinate  axes,  u,  v  are  the  components 
of  the  velocity  in  these  directions  and  p  denotes  the  static 
pressure;  p  and  p  are  the  density  and  viscosity  respectively, 
which  are  assumed  to  be  given  functions  of  x.  y  only. 

A  linearisation  of  (2.1)  is  obtained  by  freezing  p,  p  at  p 

PQ  respectively,  and  velocities  u,  v,  where  they  contribute  to 
non-linear  terms,  at  uQ  and  v^  respectively.  For  simplicity  it  is 
assumed  that  pQ ,  pQ ,  uQ ,  vQ  are  constants.  The  linearised  system 
may  then  be  written  as 


where  the  linearised  convective  operator 


c 


v 
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The  existence  of  iterative  methods  with  good  error  smoothing 
properties  depends  upon  the  ellipticity  of  the  discrete 
representation  of  the  partial  differential  equations  in  question. 
This  in  turn  is  dependent  on  the  ellipticity  of  the  continuous 
problem.  The  system  (2.2)  is  well  known  to  be  elliptic  in  the 
sense  of  Douglis  and  Nirenberg[3] .  The  determinant  of  the  Fourier 
transform  of  L  from  (x,y)  into  (^>©2)  is: 

t  =  d«t  t  =  <e^e*)[,.0(e^)  ♦  PoMVi*vo0a'.  • 

The  system  (2.2)  is  defined  to  be  elliptic  if  C  is  non  zero 
for  all  real  0=(0  ,02)  not  equal  to  (0,0).  Clearly  this  is  the 

case  for  all  non-zero  and  hence  the  linearised  problem  is 

elliptic.  Thus  the  non-linear  problem  (2.1)  is  considered  also  to 
be  elliptic. 

2  2  lA 

Defining  the  Reynolds  number  as  Re  =  p0(u0+vQ)  0 ’  we  note 

that  for  highly  convective  regimes,  for  which  the  Reynolds  number 
is  large,  ellipticity  is  most  nearly  lost  for  lines  in  the 
(Of, ©2 )  plane  perpendicular  to  a  local  streamline. 


t 


2.2  Discretisation 

The  discretised  equations  are  formed  using  a  staggered  grid  with 
variables  located  as  shown  in  figure  1.  Due  to  the  staggering  of 
the  mesh,  the  three  different  types  of  control  volume  shown  in 
figure  2  will  be  required  for  the  two  momentum  and  continuity 
equations.  The  finite  volume  equations  are  derived  in  a  standard 
manner  by  integrating  (2.1a-c)  over  their  respective  control 
volumes.  The  resulting  discrete  equations  on  a  uniform  grid  D^C 

fl,  of  mesh  length  h,  are 


L 


hqh 


(2.3) 


582 


Pressure-Correction  Method  as  a  Nonlinear  Smoother 


NW  UN  N  UNE  NE 


VNW‘ 

w _ 

i  4 

p 

kVN 

,VNE 

r* 

V 

UP 

i  4 

UE 

'Vp  i 

H  W 

> 

r 

SW  Ug  S  Ugg  SE 
FIG.l  Staggered  MAC  Grid 


N 


FIG. 2  Control  Volumes 


N 


1  ' 
i 

m— ; 

,  1 

1 

i 

- s< 

i  1 

i — 

S 

cont  tnui ty 


where  uj1,vj1’ph  are  Sr  idf  unc  t  i  ons  defined  on  which  approximate 
u,v,p.  The  component  operators  of  are  defined  by: 

aa  't'(x.y)  :=  a“  'l'(x,y)-aa  V  (  x+h  .  y  ) -aa  ♦{x-h.y) 

-aa  ^(x.y+hj-ag  t(x,y-h) 


♦(x.y) 


ap  ’*,(x.y)-ag  '('(x+h  ,  y  )  -a^ 
-aJJ  ♦(x.  y+h)-ag 


♦ ( x-h , y ) 
♦( x , y-h ) 


*(x.y) 

♦(x.y) 


»(x+h/2,y)-'(>(x-h/2,y) 

h 


»(x,y+h/2)-'(>(x,y-h/2) 

h 
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In  this  paper  we  are  not  concerned  with  the  particular  form 

\1  V 

of  coefficients  such  as  ap.ap  and  so  forth.  For  the 

discretisation  of  Patankar  and  Spalding[4]  these  are  given  in 

Sivaloganathan  and  Shaw[6].  Artificial  viscosity  is  added  to 

ensure  that  the  coefficients  are  non-negative.  A  more  detailed 

discussion  of  this  can  be  found  in  Patankar  and  Spalding[4],  but 

essentially  it  results  in  a  switching  from  central  to  donor  cell 

upwinding  of  the  convective  term  plus  neglect  of  diffusion  in  the 

direction  considered  whenever  the  appropriate  cell  Reynolds 

number  is  greater  than  2.  The  discrete  ellipticity  of  L,  is 

'  n 

established  in  Shaw  and  Sivaloganathan[5] . 

The  determinant  C^(0)  of  the  discrete  Fourier  transform  £.^(0)  of 
from  (x.y)  into  { 0 ^ )  is  such  that: 

«e(Ch(0))  *  M0{4{s2+s|)/h2}2. 
sJ  =  sinfOj/^),  s2  =  sin(02/2). 

and 

5,m(t:h(0))  =  -4po(s2  +  s2)(u()sin01+vosin02)/h3. 

Clearly  |th(0)  |  is  small  only  in  the  region  of  (0^ , 02 )  =  (0 , 0) 

and  hence  has  a  good  h-e 1 1  ip t i c i ty  measure  and  the 

discretisation  is  discretely  elliptic  in  the  sense  of  Brandt[2]. 
This  property  is  most  nearly  lost  along  lines  in  the  0  plane  such 
that  vQ/UQ=-sin0j/sin02 ,  for  which  ^/»(C^(0))  vanishes. 


3.  PRESSURE  CORRECTION  METHODS 

In  this  section  we  describe  the  SIMPLE  algorithm  of  Patankar  and 
Spalding[4],  setting  it  in  a  framework  of  general  pressure 
correction  methods.  This  leads  naturally  to  the  presentation  of  a 
new  class  of  pressure-correction  algorithm. 
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Consider  the 


sy s  tem : 
L 


hqh 


,fu  fv  P)T 
'  h'  h*  h'  1 


(3.1) 


which  is  equation  (2.3)  with  additional  source  terms,  which  arise 
during  the  multigrid  process. 

Let  dj1=  (u^  ,  )  be  an  approximate  solution  to  (3.1),  and 
q^=  (“h  ’  ^  =clj1+dq=q^+(6u  ■  Sv  .  5p)  be  the  approximation  obtained 
from  q^  after  a  pressure  correction  iteration.  Substituting  q^ 
into  equation  (3.1)  gives: 


(ahuh^5hVh+6hph~fh> 

+ 

(a^6u 

-p6^6^6v+6^6p)  =  0 

(3.2a) 

f  v  Excy  . cy  t v  A 

^ahvh~p6h6huh+6hph'fh> 

+ 

K6v 

-/n6*6^6u+6^6p )  =  0 

(3.2b) 

<eSvshvh-fh>  *  (*S6u**h 

5v)  = 

0. 

(3.2c) 

The  solution  of 

(3. 

2)  for 

6q  yields  an  exact 

solution  q^  to 

(3.1).  However  the  system  (3.2)  is  as  difficult  to  solve  as  (3.1) 
itself.  Therefore  we  attempt  to  solve  a  simplified  problem. 
Particular  pressure-correction  methods,  described  in  the 
literature,  arise  from  making  various  assumptions  concerning  the 
terms  in  (3.2).  The  first  and  most  common  is  that  the  current 
approximation  q^  already  satisfies  the  momentum  equations  of 


(3.1).  In  order  to  justify  this  assumption  the  pressure 
correction  method  is  normally  applied  only  after  the  approximate 
independent  solution  of  the  momentum  equations.  In  practise  this 
is  often  interpreted  as  meaning  the  application  of  one  or  two 
line  relaxation  sweeps. 

Le  t 


a/u.-pSrs^v.+d^p,- 

h  h  h  h  h  Ir  h 

rP  =  6?u, +  6^v, -fP 
h  h  h  h  h  h 


u 

h 


v 

h 


Equations  (3.2)  may  be  written: 
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ah6u  "p6h6h5v+ 6h6p  =  "K 

a^6v  -p6^6^6u+6^6p  =  -3r^ 

S^6u+6p6v  =  -rP. 
n  n  n 

where  3=1-  If  3  is  set  to  zero  the  assumption  discussed  above  is 
made.  The  SIMPLE  pressure-correction  method  further  neglects  the 
mixed  derivative  terms  and  diagonalises  the  operators  alj  and  a 


to  obtain: 


<  6u  -  <  6p  -K 

dh 6v  =  "5h 6p  -K 

6*6u+6p6v  =  -rP 
h  h  h 


where  daV(x .  y  )  =aa>K(x .  y  )  and  d^+(x .  y  ) sa^* (x .  y  } 
Hence  the  corrections  6q  satisfy: 


6u  =  -(da)  1  ( 6^  6p  +  3rP ) 


(3.3a) 


6v  =  -(d*)'*  (6*  6p  +30 


(3.3b) 


(3.3c) 


The  algorithm  proceeds  by  applying  a  few  line  Gauss-Seidel 
sweeps  to  the  'Poisson'  equation  (3.3c)  for  Sp  and  hence 
obtaining  6u ,  6v  from  equations  (3.3a-b).  The  updated  solution  q^ 

is  then  defined  by: 


u,  +  u  6u 

h  uv 

v,  +  u  6v 

h  uv 

p,  +  u  6p, 
h  p 


where  u  .  u  are  (under)  relaxation  parameters, 
uv  p  v  ’ 

This  procedure  is  applied  after  a  few  line  relaxation  sweeps 
of  the  momentum  equations,  using  a  relaxation  parameter  u  .  For 


3=0  the  usual  SIMPLE  algorithm  is  obtained.  Putting  3=1  we  have  a 


>  A 
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new  class  of  pressure-correction  methods  which  neglect  fewer 
terms  of  (3.2).  Since  the  momentum  residuals  are  calculated  as 
part  of  their  relaxation  procedure  the  additional  work  involved 
in  implementing  the  method  with  (5=1  is  not  great.  This  method 
will  be  discussed  more  fully  in  a  forthcoming  paper. 


4.  FOURIER  ANALYSIS  OF  PRESSURE-CORRECTION  METHODS 

In  this  section  we  use  local  mode  analysis  (introduced  by 
Brandt[l])  to  examine  the  SIMPLE  pressure-correction  algorithm, 
with  3=0  and  3=1,  from  the  point  of  view  of  its  smoothing 
ability.  The  reduction  of  high  frequency  error  components  is  a 
local  process  dependent  principally  on  the  local  difference  star. 
Thus  the  analysis  of  this  reduction  need  not  take  account  of 
distant  boundaries,  or  varying  difference  stars. 


Consider  an  arbitrary  local  section  of  the  mesh  with 
velocity  and  pressure  distributions  as  shown  in  figure  1.  Assume 
that  at  the  start  of  the  iteration  the  errors  in  u,  v,  p  are 
given  by  : 


■  I 


'6 

v 

50 

,P 


and  that  the  0=(01,02)  components  of  the  errors  are  defined  by 


u 

u 

ee 

ae 

V 

V 

ee 

_ 

ae 

.  •?  . 

.  ae  . 

exp( 10 . x/h) 


where  0 . x/h  =  (0jX  +  ©2y)/h  . 

After  the  first  stage  of  SIMPLE  (momentum  relaxations)  the 
error  amplitudes  have  become 
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exp( i0 . x/h) 


and  after  the  second  stage  (pressure  correction) 

exp( i0 . x/h) . 


Our  aim  is  to  find  the  amplification  matrix  A  which  is  defined  by 


where  and  A are  amplification  matrices  for  stages  1  and  2 

respectively.  The  smoothing  factor  will  then  be  given  by 

H  -  sup  [ p ( A )  ] .  (4.1) 

0  €  * 

2  2 

where  3f=[-Tr,7r]  /  (  -tt/2  .  tr/2  )  is  the  set  of  'high'  frequencies.  In 
the  case  of  convection  dominated  flows  the  definition 

M  =  sup  [ P(A) ]  (4.2) 

|u0Ul.  |voUl.0€* 

may  be  more  relevant,  where  uQ,  vQ  are  the  frozen  velocities  used 
to  linearise  the  problem.  In  this  case  u^,  v^  are  constrained  in 
order  to  maintain  the  r  evant  Reynolds  number. 

4.1  Momentum  Relaxation 

Smoothing  analysis  of  line  relaxation  methods  is  straightforward 
and  can  be  found  in  Brandt[l].  Thus  the  amplification  matrix  Aj 
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of  the  first  stage  of  the  pressure  correction  method  is  easily 
calculated.  This  is  given  in  Shaw  and  Sivaloganathan[5]  for  the 
discretisation  of  Patankar  and  Spalding[4] 


4.2  Pressure-Correction 

A  Fourier  analysis  of  the  pressure-correction  stage  of  the  SIMPLE 
algorithm  for  3=0  and  3  =  1  has  been  made.  Smoothing  factors 
resulting  from  this  analysis  are  presented  in  section  4.3.  In 
this  section  we  illustrate  the  technique  used  to  obtain  these 
results,  by  analysing  the  case  3=0.  The  case  3=1  Is  analogous  but 
rather  more  involved.  Assuming  that  the  'Poisson'  equation  (3.3c) 
is  solved  exactly  for  6p ,  the  pressure-correction  decribed  in 
section  3  for  3=0  amplifies  the  errors  as  follow  s  •' 


,,U 

.  U 

g 

e 

ev 

“  Th 

.  V 

e 

gP 

eP 

. 

1 

, 

where 


I,-u  (d,  )  6,  P,  6, 

h  uvv  h'  h  h  h 


,  .  v,  -  2  ,-yn- 1  _x 
-cj  (d,  )  6fP,  6 
uvv  h'  " 


_—  1  _x 
p  h  h 


-u  ( d,  )  6,  P, 

uvv  h  h  h  h 


...  w,  I.-u  {dY)  hi?  .hi 

h  h  h  h  uvv  h'  h  h  h 


u  P,  *6* 

p  h  h 


Iv.  J 


is  the  'Poisson'  operator.  1^  the  identity  operator,  and  t,uv>wp 

are  relaxation  parameters  for  updating  velocities  and  pressure 
respec  t i ve ly . 

The  amplification  matrix  A is  obtained  by  taking  the 
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discrete  Fourier  transform  of  T,  .  Thus: 

n 


1  + 


.  2 
4u  s , 
uv  1 

aj  \(0)h: 


4(0uvS1S2 

ap  \(e)h2 


2Vi 
hK  (6) 


1  + 


4(t>uvS  1  S2 

ap  ^h(6)h2 


„  2 
4(JuvS2 

ap  fh(6)h2 


2%S2 

h^(0) 


(4.3) 


where  (0)  is  the  symbol  of  P, 


ph(0)  = 


exp( i©1 )+exp(-i©1 )-2  +  exp( i©2)+exp(-i©2)-2 . 


a  u2 
ap  h 


v  u2 

ap  h 


4.3  Smoothing  Factors 

The  smoothing  factor  of  the  SIMPLE  algorithm  is  calculated  as 
follows.  The  amplification  matrix  A  =  is  easily  calculated. 

A  is  a  3  by  3  complex  non-Hermi tian  matrix.  Its  amplification 

factor  p(©)=p(A)  is  found  using  a  NAG  routine  for  the  eigenvalues 

of  a  general  matrix.  The  smoothing  factor  defined  by  equation 

(4.1)  is  then  found  by  embedding  the  calculation  of  p(0)  in  a  NAG 

routine  for  linearly  constrained  minimisation.  The  definition  of 

jj.  for  highly  convective  flow  given  in  equation  (4.2)  was  found  to 

be  too  costly  to  evaluate.  As  an  alternative  we  define  the 

smoothing  factor  to  be: 

p  =  max  <  sup  [p(A)]  >. 

(u0'V0)6H  1  ©6*  > 

where 

*  =  (  (0.1).(1.1).(1.0).(1.-1),(0.-1). (-1.-1). (-1.0). (-1.1)  } 

is  a  set  of  flow  directions  of  interest. 
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FIG. 3  Amplification  Factor  at 


Mesh  Reynolds  Number  1 


(Uq.Vq)=(Q.1)  (u0.v0)=(1,1)  (u0.v0)=(1.0) 


FIG. 4  Amplification  Factors  at  Mesh  Reynolds  Number  XOO 


Smoothing  factors  are  presented  in  figures  3-7.  The  relevant 
relaxation  parameters  used  for  each  figure  appear  in  tables  1-2. 
These  parameters  have  been  optimised  empirically  as  far  as 
possible.  Figures  3-4  give  contours  and  isometric  plots  of  the 
ampl  i  f  ica  t  ion  factor  t*(9)  for  alternating  symmetric  line 
relaxation  of  the  momentum  equations  followed  by  (fl=0)  pressure 
correction.  The  contour  j  represents  p(0)=j/lO.  In  figure  3  the 
mesh  Reynolds  number  Re^=l  (Reynolds  number  66).  The  smoothing 

factor  of  .635  is  independent  of  flow  direction  since  the  flow  is 
dominated  by  diffusion.  This  smoothing  factor  is  quite 
satisfactory  and  allows  the  construction  of  efficient  multigrid 
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procedures.  Figure  4  depicts  /x { 0 )  at  Re^=100  (Reynolds  number 

6600)  for  three  different  flow  orientations.  In  this  case  the 
flow  is  dominated  by  convection  and  n(Q)  is  consequently  strongly 
dependent  on  flow  direction.  The  effect  of  the  near  loss  of 
discrete  ellipticity  is  illustrated  admirably  by  figure  4.  The 
amplification  factor  is  well  behaved  except  along  lines  of  the  0 
plane  perpendicular  to  the  flow  direction  -  for  which  n(Q)  is 
close  to  1.  These  are  precisely  the  lines  along  which  ellipticity 
is  most  nearly  lost.  The  resultant  poor  smoothing  is  thus  a 
feature  of  the  discretisation  rather  than  the  smoothing  method. 
However,  when  conclusions  are  drawn  from  the  results  of  the 
smoothing  analysis  at  high  Reynolds  numbers,  it  should  be  borne 
in  mind  that  the  analysis  is  of  a  local  nature  that  does  not  take 
into  account  the  effects  of  boundary  conditions. 

Figures  5-7  show  the  smoothing  factor  p  for  values  of 
(u^v^eil  at  mesh  Reynolds  numbers  0,  15.2,  151.5  corresponding 

to  Reynolds  numbers  1.  1000  and  10000  respectively.  In  each 
figure  the  results  for  3=0  and  3=1  are  given.  The  analysis  aims 
to  model  the  empirical  results  of  Si val oganathan  and  Shaw[6] 
discussed  in  section  5.  It  is  clear  that  the  use  of  3=1 
considerably  improves  smoothing  factors,  at  least  up  to  Re=1000, 
and  renders  them  less  dependent  on  flow  direction. 


3=o 


P=1 


FIG. 7 


for  Velocities  in  K  at  Re  =  10000 


Shaw  and  Sivaloganathan  593 

5.  A  COMPARISON  OF  THEORETICAL  AND  PRACTICAL  SMOOTHING  ANALYSIS 

Some  doubt  exists  as  to  the  validity  of  theoretical  smoothing 

analysis  in  the  case  of  convection  dominated  flow.  This  is 

principally  due  to  the  linearisation,  which  assumes  a  constant 

velocity  field  when  evaluating  non-linear  terms  in  equations 

(2.1).  In  practice  these  velocities  are  locally  variable  and 

dependent  on  the  current  solution. 

Given  a  multigrid  method  for  solution  of  the  problem  under 

consideration  a  practical  smoothing  analysis  may  be  made  in  order 

to  test  the  validity  of  the  theoretical  analysis.  This  is 

possible  since  the  theoretical  smoothing  factor  is  defined  to  be 

the  asymptotic  convergence  rate  of  each  smoothing  iteration  in  a 

two-grid  multigrid  method  with  a  perfect  coarse-grid  correction: 

i.e  a  coarse-grid  correction  which  annihilates  all  error 

2 

components  in  the  'low'  frequency  range  0  6  9?  =  ( -ir/2 ,  ir/2 )  It 

is  not  difficult  to  simulate  this  type  of  method  in  practice.  A 
multigrid  method  is  used  to  solve  the  problem  on  a  finest  grid 
denoted  by  .  Progressively  coarser  grids  0^  k=m-l(-l)l  are 

defined  in  a  natural  manner.  The  multigrid  method  used  has  Uj 

pre-relaxations,  «2  post-relaxations  and  7k  coarse-grid 

corrections  on  each  grid  fl,  .  7  is  chosen  to  be  1.  as  in  a 

°  k  m 

conventional  multigrid  method  (where  7=1  or  7=2  are  the  usual 

choices).  7  ,  is  chosen  to  be  much  larger,  so  that  the 
m~  1 

coarse-grid  correction  for  fl  is  solved  almost  exactly.  On 

m 

coarser  grids  7^=1 .  k=m-2(-l)l  is  a  reasonable  definition.  This 

method  simulates  the  situation  described  above.  The  practical 
smoothing  factor  fi  is  therefore  defined  by 


H  =  (tc  ) 
P  mg' 


1/^U1+U2^ 


where 

me  thod 
and  a 

P 


k  is  the  asymptotic  convergence  rate  of 
mg 

described  above.  If  the  theoretical  analysis 
will  be  in  close  agreement. 


the  multigrid 
is  valid  p 
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Table  3  gives  values  of  for  the  experiments  described  in 

Sivaloganathan  and  Shaw[6]  together  with  minimum  and  maximum 
theoretical  smoothing  rates  over  velocity  fields  in  1/  .  These 
three  rates  are  also  depicted  in  figure  8.  It  is  clear  that  the 
minimum  theoretical  smoothing  rate  accurately  models  the 
practical  behaviour  of  SIMPLE  as  a  smoothing  method,  even  for 
high  Reynolds  numbers.  Note  from  figure  4  that  p  always  occurs 

IT)  SIX 

for  velocity  fields  aligned  with  grid  lines.  For  practical  flows 

in  which  there  is  no  persistent  alignment  between  streamlines  and 

grid  lines  one  would  therefore  expect  the  method  to  behave  as 

predicted  by  a  .  This  indeed  seems  the  case  for  the  present 
r  J  'min 

recirculating  flow  problem  -  shear  driven  cavity.  However,  even 
in  cases  of  strong  alignment  /u  is  expected  to  be  a  pessimistic 

prediction  of  the  practical  smoothing  rate. 

Figure  8  also  demonstrates  the  divergence  of  the  two 
theoretical  rates  which  occurs  at  the  onset  of  upwinding  -  at 
mesh  Reynolds  number  2,  corresponding  to  Reynolds  number  132. 


TABLE  1  Relaxation  Factors  and  Smoothing  Factors  For  Figures  3-4 

(U=0) 


Fig  . 
No 

C 

o 

< 

o 

Mesh 

Re 

u 

mom 

3 

(0,1) 

1 

.  42 

.  635 

4 

(0.1) 

100 

.  15 

.987 

d.i) 

100 

.  15 

.714 

(i.o) 

100 

.  15 

.987 

In  all  cases  uuv=l=up. 
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FIG. 
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8  Comparison  of  Theoretical  and  Practical  Smoothing  Rates 


TABLE  2  Relaxation  Factors  and  Smoothing  Factors  For 
Figures  7-12 


Fig. 

No 

Re 

p 

(l) 

mom 

(J 

uv 

U 

P 

^min 

M 

max 

5 

1 

0 

0 . 50 

1 . 00 

1 .00 

.  603 

.  604 

1 

0.25 

1 . 00 

1 . 20 

.  491 

.  491 

6 

1000 

0 

0.25 

1 . 00 

1 . 00 

.  720 

.  935 

1 

0 . 50 

1 . 00 

0.70 

.  678 

.  745 

7 

10000 

0 

0. 15 

1 . 00 

1 . 00 

.  839 

.  996 

1 

0 . 80 

0 . 35 

0.35 

.  841 

.  936 
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TABLE  3  Theoretical  and  Practical  Smoothing  Factors 


Reynolds 

No 

^min 

u 

'max 

5P 

1 

.  603 

.  604 

.  478 

100 

.  630 

.  664 

.  558 

400 

.  592 

.774 

.  588 

1000 

.720 

.935 

.  725 

5000 

.  839 

.  992 

.  840 

10000 

.  839 

.  996 

.910 
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INTRODUCTION 

Calculations  of  electromagnets  present  a  special  case  of  mildly  nonlinear  elliptic  equations. 
Difficulties  arise  from  the  change  of  coefficients  across  the  material  boundaries  by  some  or¬ 
ders  of  magnitude.  Therefore,  some  extra  work  has  to  be  done  in  smoothing  and  intergrid 
transfers  near  the  material  boundaries.  Parallel  versions  may  utilize  a  splitting  along  these 
boundaries  and  assign  the  'normal'  and  'extra'  parts  of  the  calculations  to  separate  process¬ 
ors. 

The  three-dimensional  calculations  of  the  fields  in  an  electromagnet  are  nart  of  many  design 
procedures  in  electrical  engineering.  The  existing  software  like  PROFI  [2],  TOSCA  [5]  per¬ 
forms  quite  satisfactorily  for  coarse  meshes,  but  due  to  the  imbedded  solution  algorithms 
(  BSOR,  ICCG,  sparse  direct  solvers  ),  the  performance  is  not  satisfactory  for  big  problems. 
A  multigrid  approach  should  reduce  the  computation  time  considerably.  To  allow  efficient  use 
of  MIMD  computers  being  developed,  some  care  is  taken  to  allow  simple  partitioning  of  the 
computations. 

The  program  is  still  in  the  development  process.  A  few  problems  have  been  set  up  to  test  the 
treatment  of  lamination  effects  and  of  interfaces.  The  algorithm  performed  well,  but  the  basis 
of  test  cases  is  too  small  for  general  claims  on  Ihe  quality  of  the  program. 

1.  The  analytical  formulation  of  the  problem 

The  equations  to  be  solved  are  Maxwell's  equations,  which  are  used  in  integral  form  [7], 
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f  H  ds  =  f  f  J  dA 

JdA  JA J 


for  V  surfaces  A 


(1) 


j  I  B  dA  =  0  for  V  volumes  V  (2) 

J/)VJ 

B  =  h(\H\)H  (3) 

J  is  required  to  be  source  -  free. 

The  natural  domain  of  the  equations  is  the  entire  space,  reductions  to  finite  regions  are  ob¬ 
tained  by  symmetry  conditions  and  by  analytic  approximations  for  the  far  field. 

To  reduce  the  number  of  unknowns,  the  field  H  is  split  into 

H=H'  -  grad<f>  (4) 


with 


f  H'  ds 
JdA 


J  dA 


for  V  surfaces  A 


p  grad<f>  dA 


pH’  dA 


for  V  volumes  V 


The  usual  approach  is  to  calculate  H'  to  be  the  corresponding  vacuum  field 


(5) 

(6) 


(7, 

V 

which  is  a  simple  but  time-consuming  activity.  This  definition  of  H'  also  gives  H'  ~grad<}> 
in  the  iron  part  and  therefore  leads  to  large  cancellation  errors.  Now  (5)  does  determine  H' 
only  up  to  a  gradient  field,  so  there  is  sufficient  freedom  to  choose  H'  in  a  way  that 
H'  is  easy  to  compute 

H'  is  nonzero  only  for  a  small  number  of  points 
H-  is  rather  small,  and  quite  different  from  grad  (/> 

Equation  (6)  for  4>  is  very  similar  to  Poisson's  equation,  the  main  difference  being  the  treat¬ 
ment  of  the  boundaries  and  interfaces 


2.  The  discretization  of  the  equations 

Two  different  meshes  are  used  in  the  discretization.  Both  meshes  are  defined  by  variable- 
distance  coordinate  planes.  The  geometry  and  the  material  filling  are  described  in  terms  of 
the  primary  mesh.  All  boundaries  and  interfaces  are  supposed  to  be  linear  interpolants  of 
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mesh  points.  The  components  of  B,  H  and  H'  are  located  on  the  edges  of  mesh  cells  parallel 
to  the  components,  the  values  of  <f>  at  the  nodes  of  the  mesh.  The  materials  are  attributed  to 
-  full  or  partially  filled  -  mesh  cells.  The  integrals  in  (5)  are  evaluated  on  the  primary  mesh 
by  a  midpoint  rule. 

The  dual  mesh  is  shifted  relative  to  the  primary  mesh  by  half  a  step  in  every  direction.  The 
integrals  in  (6)  are  evaluated  on  the  dual  mesh.  The  component  of  grad  <f>  normal  to  the  plane 
of  integration  is  evaluated  from  <f>  by  a  two-point  difference  formula;  the  other  components  are 
not  needed. 

This  allocation  of  values  and  integrals  gives  a  fully  consistent  discretization  of  Maxwell's 
equations.  For  homogeneous  material  the  discrete  equations  are  identical  to  the  usual  dis¬ 
cretization  of  the  differential  equations  curl  H'  =  J  and  A  <f>  =  div  H'  ,  but  in  this  for¬ 
mulation  there  is  no  need  for  a  special  treatment  of  boundaries  and  interfaces  The 
discretization  at  the  interfaces  turns  out  to  be  slightly  different  from  the  differential  equation 
approach  and  gives  the  same  accuracy  at  the  interface  as  in  the  interior  [7] 

3.  The  calculation  of  H' 

The  calculation  of  H'  may  be  done  by  a  simple  multi  -  level  procedure: 

Start:  Find  the  smallest  box  containing  all  currents  J  and  all  nonzero  boundary 
values  of  H'.  Define  H'  on  those  edges  of  the  box  where  H'  is  not  given  by 
boundary  conditions  in  a  way  that  (5)  is  fulfilled  for  all  six  surfaces  of 
the  box  ( if  possible,  choose  H'  constant  or  zero  on  entire  edges  ) 

Refinement  step:  Add  one  grid  plane,  cutting  the  box  in  two  parts.  This 

adds  9  equations  (  possible  surfaces  for  integrations  )  and  four  unknows 
(  values  of  H'  on  the  newly  defined  edges  ).  One  of  the  equations  contains 
the  four  new  values  of  H',  the  other  eighth  contain  just  one  of  them. 

Solving  four  of  these  eight  equations  for  the  four  new  values  of  H'  will 
satisfy  all  nine  equations  because  (5)  was  already  fulfilled  for  all  entire 
surfaces  and  J  is  source  free. 

Further  process:  This  refinement  step  may  be  repeated  iteratively  for  all  volumes 
defined  so  far,  until  the  full  mesh  and  all  of  H'  is  defined 

Parallelization  of  the  calculation  of  H'.  The  refinement  step  is  always  local  to  a  brick-shaped 
part  of  the  volume.  As  soon  as  values  of  H'  have  been  assigned  to  the  edges  of  different 
bricks  the  further  calculations  on  these  bricks  are  completely  independent  and  may  be  as¬ 
signed  to  different  processors  without  problems.  Whether  the  values  common  to  two 
processors  will  be  computed  just  once  and  passed  on  (o  the  other  processors  or  computed 
independently  by  all  processors  is  up  to  the  implementation  and  should  depend  on  the  rela¬ 
tive  speed  of  computation  and  communication. 


600 


Electromagnets  and  MIMD  Computers 


4.  The  calculation  of  <p 

The  calculation  of  <f>  requires  the  solution  of  a  nonlinear  elliptic  second  order  system  of 
equations  with  varying  -  mostly  Neumann  -  boundary  conditions.  Problems  are  due  to  the 
discontinuity  of  n  across  material  boundaries  and  due  to  the  complicated  geometries  involved. 
For  many  practical  applications,  the  mesh  size  will  be  just  barely  small  enough  to  resolve 
small  features  like  grooves  and  current  loops.  It  is  in  general  not  possible  to  describe  the 
same  physical  problem  on  a  substantially  coarser  mesh. 

4.1.  The  definition  of  coars e-mesh  problems  and  intergrid  transfers. 

For  as  many  grids  as  possible  the  coarse-grid  problems  should  be  defined  by  simply  setting 
up  the  same  equations  for  the  coarser  grid.  For  reasons  given  above,  however,  this  process 
is  restricted  to  very  few  steps,  and  the  coarsest  mesh  generated  this  way  is  usually  much  too 
fine  to  be  an  appropriate  coarsest  grid  for  a  multigrid  process  Therefore  a  more  algebraic 
construction  of  further  coarse-grid  problems  is  necessary. 

The  Interpolation  and  restriction.  The  interpolation  is  done  in  three  steps,  first  interpolating  the 
coarse-grid  edge  centers  with  the  weights  taken  from  the  appropriate  arm  of  the  difference 
star  at  the  interpolation  point,  second  filling  in  the  the  coarse-grid  plane  centers  using  a  2D  - 
restriction  of  the  difference  star  and  finally  computing  the  coarse-grid  cell  centers  using  the 
full  difference  star.  Restriction  is  defined  as  the  adjoint  of  interpolation  times  a  diagonal  ma¬ 
trix  to  ensure  that  restriction  and  interpolation  cancel  out  on  constant  fields. 

The  coarse  grid  matrix.  The  coarse  grid  star  at  a  point  is  constructed  from  the  fine  grid  star 
at  the  same  point  and  the  interpolation  by  inserting  the  interpolation  formula  for  the  edge 
centers  into  the  fine  grid  difference  equation  This  gives  a  proper  coupling  across  interfaces 
that  are  not  aligned  with  coarse  mesh  points.  The  coarse  grid  equations  generated  this  way 
have  the  same  simple  structure  as  the  fine  grid  equations  A  full  Galerkin  procedure  to  con¬ 
struct  the  coarse  grid  matrix  would  give  somewhat  better  coarse  grid  solutions  at  the  cost  of 
more  complicated  equations  and  probably  is  not  worth  while 

The  cycling  process.  As  the  coarser  grid  constructed  this  way  does  not  yield  very  good  ap¬ 
proximations  to  the  fine  grid  solution,  a  W-cycle  is  preferred  over  a  V-cycle.  A  fully  adaptive 
cycle  solving  any  coarse  grid  to  10%  of  the  next  finer  grid  residual  has  been  tried  but  is  not 
substantially  better  than  a  W-cycle  For  the  grid  levels  defined  physically,  a  V-cycle  is  suffi¬ 
cient. 
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4.2.  The  treatment  of  the  nonlinearity. 

The  value  of  p  does  not  depend  on  <j>  alone  but  on  the  full  value  of  H.  This  value  is  readily 
available  only  on  a  mesh  allowing  a  direct  formulation  of  the  physical  problem.  Therefore  a 
Newton  method  for  the  nonlinearity  seems  natural.  The  update  procedure  for  p  is  the  same 
as  in  the  established  'Profi'  code  [2].  For  further  development,  it  is  planned  to  treat  the 
nonlinearity  with  a  FAS  method  on  all  grid  where  local  values  of  p  are  meaningful. 

The  smoothing  procedure. 

Away  from  the  boundaries  and  interfaces  the  difference  star  is  the  same  as  for  Poisson's 
equation,  so  simple  red-black  Gauss-Seidel  is  the  most  effective  smoother  there,  giving  a 
smoothing  rate  of  about  0.25  .  The  smoothing  rate  near  the  boundary  depends  on  the  precise 
form  of  boundary  conditions,  it  is  in  general  a  little  higher,  but  not  very  much,  so  the  bound¬ 
aries  do  not  need  special  treatment. 

The  situation  near  the  material  interface  is  quite  different.  Here  the  value  of  p  may  change  by 
more  than  four  orders  of  magnitude  across  the  interface,  and  the  difference  star  involved  will 
be  very  asymmetric,  e  g.  for  two  dimensions 

2004  <f>(x)  =  1000  4>{x  +  Ax)  +  2  </>(x  -  Ax)  4-  501  <f>(x  4-  Ay)  4-  501  </>(x  -  Ay)  (8) 

The  appropriate  view  for  the  construction  of  the  smoother  is  to  think  of  the  whole  problem  as 
a  Poisson  problem  in  the  iron  with  mixed  (  afmost  Neumann  )  boundary  conditions,  which  is 
(  almost )  independent  of  anything  going  on  in  the  non-iron  part,  and  a  Poisson  problem  with 
Dirichlet  boundary  data  in  the  non-iron  pari,  getting  the  Dirichlet  data  from  the  solution  of  the 
problem  in  the  iron.  The  non-iron  part  does  not  require  any  special  measures,  while  the  iron 
part  does 

Smoothing  for  the  iron  region.  The  iron  region  may  exhibit  three  separate  features  requiring 
special  attention  in  the  smoothing  process. 

Laminated  Iron.  For  laminated  iron,  the  value  of  p  depends  on  the  direction.  At  the  moment, 
lamination  is  allowed  only  in  the  direction  of  a  coordinate.  For  normal  technical  problems,  this 
is  not  a  severe  restriction.  If  the  region  of  lamination  is  only  a  few  mesh  cells  in  size,  red- 
black  relaxation  is  still  good,  but  for  larger  regions  zebra  plane  relaxations  are  appropriate. 
The  best  organization  scheme  is  probably  to  do  an  overall  red-black  relaxation  and  then  add 
a  local  plane  relaxation  for  the  region  of  lamination  only. 

Reentrant  edges.  The  magnet  iron  normally  has  reentrant  edges  with  90“  angle.  These  may 
or  may  not  be  in  areas  of  lamination.  In  any  case,  one  extra  relaxation  of  a  few  points  near 
the  edges  after  every  global  relaxation  improves  the  smoothing  rate  almost  to  the  rate  for 
convex  regions.[1] 
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The  Interface.  The  iron  -  non-iron  interface  produces  the  most  serious  problems  for  the 
smoother.  Simply  ignoring  it  will  give  a  smoothing  rate  of  about  0.7,  which  would  require  too 
much  work  done  on  fine  grids.  On  the  other  hand,  the  number  of  interface  points  will  be  neg¬ 
ligible  relative  to  the  total  number  of  points  only  for  very  Tine  grids,  so  all  measures  to  improve 
the  smoothing  rate  should  not  need  too  much  extra  work  per  point. 

On  coarse  grids  -  total  number  of  points  not  more  than  10  times  the  number  of  interface  points 
-  the  best  strategy  is  simply  to  perform  extra  relaxations  on  the  entire  grid. 

On  intermediate  grids  -  total  number  of  points  10  to  50  times  the  number  of  interface  points  - 
one  to  two  local  relaxations  only  for  the  interface  and  one  layer  of  point  to  each  side  per  full 
sweep  are  advisable. 

On  fine  grids  the  interface  deserves  a  more  thorough  treatment,  e  g.  solving  for  blocks  con¬ 
taining  the  interface  and  two  layers  to  each  side  with  every  global  relaxation  sweep. 

If  the  iron  is  laminated  in  a  direction  crossing  the  interface,  all  the  extra  calculations  at  the 
interface  may  be  done  separately  for  different  relaxation  planes,  as  different  planes  are  only 
weakly  coupled. 

The  distinctions  between  coarse,  intermediate  and  fine  grids  are  tentative  ones  and  certainly 
hardware-dependent,  for  pipeline  machines  the  limits  are  higher  than  for  purely  sequential 
machines. 

5.  Parallelizing  the  computation  of  <j>. 

Methods  of  parallelizing  algorithms  are  certainly  hardware  -  dependent  The  considerations 
here  are  based  on  a  loosely  coupled  system  of  moderately  powerful  processors  communi¬ 
cating  via  message  passing  where  the  number  of  messages  is  normally  of  more  importance 
than  the  size,  e  g.  the  SUPRENUM  concept  [6].  At  the  moment,  however,  these  machines  are 
not  readily  available,  so  the  considerations  are  of  rather  theoretical  nature.  The  feasibility  of 
the  concepts  has  been  demonstrated  with  simulators  and  experimental  machines  [3,4],  but 
efficiency  can  be  proved  only  with  proper  hardware 

The  considerations  for  parallelizing  the  computation  of  </>  are  basically  the  same  as  for 
Poisson's  equation.  In  the  present  situation  this  means  assigning  each  processor  a  certain 
part  of  the  geometry  -  possibly  with  some  overlap  -  and  exchanging  the  values  of  the  solution 
at  the  boundaries  of  these  regions  when  needed.  The  choice  of  partitioning  of  the  geometry 
should  balance  the  load  for  the  processors  and  minimize  the  communication  overhead.  This 
implies  that  it  may  well  be  best  to  partition  in  only  one  or  Iwo  coordinate  directions.  A  small 
overlap  of  the  regions  may  eliminate  the  need  to  communicate  with  the  neighboring 
processors  after  every  relaxation  sweep,  communication  may  only  be  needed  with  grid 
transfers. 

Special  features  are  introduced  by  lamination  and  by  the  interface.  If  plane  relaxations  are 
done,  it  is  advisable  to  allocate  a  plane  to  only  one  processor  to  eliminate  communication 
requests  within  the  plane  relaxations. 
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The  treatment  of  the  interface  takes  extra  time,  so  the  regions  containing  interface  points 
should  be  substantially  smaller  than  other  regions.  If  a  problem  is  big  enough  to  be  solved 
on  a  MIMD  machine,  it  is  certainly  big  enough  to  invest  extra  effort  in  the  interface  treatment, 
and  a  good  way  to  do  it  is  certainly  to  assign  the  blocks  of  points  around  the  interface  desig¬ 
nated  for  separate  relaxation  to  separate  processors. 

Conclusions. 

The  multigrid  solution  of  magnet  problems  certainly  gives  a  big  improvement  over  existing 
methods,  but  the  efficiency  still  is  not  nearly  as  good  as  for  Laplace's  equation.  There  are  two 
points  where  substantial  improvements  should  be  possible;  the  treatment  of  the  interface 
within  the  smoothing  and  grid  transfer  process  and  the  treatment  of  the  material  constants. 
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INTRODUCTION 

The  numerical  prediction  of  buoyancy- induced  flows  provides  special  difficulties  for  standard  numeri¬ 
cal  techniques  associated  with  velocity-buoyancy  coupling.  We  present  a  multigrid  algorithm  based 
upon  a  novel  relaxation  scheme  that  handles  this  coupling  correctly.  Numerical  experiments  have 
been  performed  that  show  that  this  approach  is  reasonably  efficient  and  robust  for  a  range  of  Ray¬ 
leigh  numbers  and  a  variety  of  cycling  strategies. 


1.  OVERVIEW 

The  multigrid  concept  has  emerged  as  one  of  the  most  promising  for  the  solution  of  certain  types  of 
partial  differential  equations.  There  are  extremely  fast  and  robust  codes  available  for  single  elliptic 
equations  (see,  for  example  [6]),  and  the  techniques  have  been  successfully  applied  to  some  systems 
of  elliptic  pdes.  The  philosophy  of  multigrid  algorithms  is  (in  a  certain  sense)  to  find  an  efficient 
smoother,  i.e.  a  relaxation  scheme  which  reduces  high  frequency  errors,  and  organize  a  hierarchy  of 
grids  so  that  this  rate  of  convergence  applies  to  all  error  modes. 

The  aim  of  this  paper  is  to  present  a  multigrid  algorithm  and,  in  particular,  a  novel  relaxation 
scheme  that  is  effective  for  buoyant  flows.  This  is  an  extension  of  the  block-implicit  scheme 
developed  by  Vanka  [5]. 

Natural  convection  flows  cause  special  numerical  problems  for  iterative  schemes  (see,  e.g.,  [7] 
and  [9]).  The  new  feature,  not  present  in  forced  flows,  is  the  coupling  between  momentum  equations 
and  the  temperature  equation  through  the  buoyancy  source  term.  Conventional  (i.e.,  segregated) 
schemes  that  update  the  velocity  fields  independently  of  the  temperature  field  suffer  from  a  severe 
restriction:  the  effective  time-step  taken  in  this  type  of  iterative  procedure  is  limited  by  the  buoyant 

This  work  was  supported  in  part  by  the  Applied  Mathematical  Sciences  subprogram  of  the  Office  of  En¬ 
ergy  Research,  U.S.  Department  of  Energy,  under  contract  W-31-109-Eng-38. 
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Group,  Inc,,  Downers  Grove,  IL  60515. 
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time-scale  (see  [8]  and  [9]).  This  situation  is  true  for  both  steady  and  transient  problems  and  can  be 
extremely  restrictive  because  conduction  time-scales  are  often  between  three  and  six  orders  of  magni¬ 
tude  longer  than  the  buoyant  time-scale.  Thus,  in  processes  where  buoyancy,  convection,  and  con¬ 
duction  are  present,  special  techniques  are  required  for  efficient  solution. 

In  Section  2  we  describe  the  “laminar  double-glazing  problem.”  This  test  case  has  been  widely 
studied  [10],  and  many  solution  algorithms  have  been  applied  to  it.  It  has  several  interesting  features. 
There  is  a  high  degree  of  nonlinearity  in  the  problem,  causing  a  significant  degree  of  structure  in  the 
resulting  flows.  Narrow  boundary  layers  are  found  for  some  parameter  values.  Unlike  some  other 
“benchmark”  problems,  this  one  is  free  from  singularities  and  also  has  a  simple  geometric 
configuration.  Also,  accurate  answers  are  available  for  this  test  case  [12],  and  comparisons  with  the 
current  results  are  presented. 

In  Sections  3  through  7  we  discuss  various  features  of  the  present  multigrid  method,  starting 
with  an  overview  and  continuing  with  various  details  of  our  algorithm,  concentrating  on  the  relaxa¬ 
tion  (Section  5)  and  the  treatment  of  the  coarse  grid  (Section  7). 

Finally,  in  Sections  8  and  9  we  present  the  efficiency  of  the  algorithm  in  terms  of  work  units, 
compare  the  accuracy  to  other  solutions,  and  discuss  possible  extensions  of  the  technique. 

Table  3  shows  the  average  rate  of  convergence  and  the  times  per  cycle  for  the  various 
Raylcigh-number/grid-size  combinations  for  a  variety  of  cycles.  Although  the  cycling  strategies  used 
are  conservative,  they  appear  to  be  reasonably  efficient  and  robust.  For  linear  problems,  multigrid 
theory  predicts  that  the  convergence  rate  is  independent  of  the  mesh  size.  This  is  approximately  true 
in  the  present  case  as  well,  a  fact  that  is  somewhat  surprising  since  the  Frechet  derivative  for  this 
problem  is  large. 


2.  GOVERNING  EQUATIONS  AND  FINITE  DIFFERENCE  APPROXIMATION 

We  consider  the  steady-state  Navier-Stokes  equations  for  the  problem  of  natural  convection  in  a 
two-dimensional  square  cavity  subject  to  differential  side  heating.  The  cavity  contains  a  viscous, 
heat-conducting  fluid  subject  to  conditions  for  which  the  Boussinesq  approximations  may  be  made. 
The  equations  will  be  given  in  non-dimensional  form  using  the  scales  L2/ k,  k/L,  and  p0 y?IL2  for 
time,  velocity,  and  pressure.  Here  L  is  a  reference  length,  k  is  the  coefficient  of  thermal  diffusivity, 
and  p0  is  a  reference  density.  The  non-dimensional  temperature  is  defined  by  T  =  (7* -J*c)/(f‘h-fte), 
where  f  is  the  local  fluid  temperature  and  7^,7^  denote  the  temperature  at  the  hot  and  cold  boun¬ 
daries,  respectively.  In  the  non-dimensional  spatial  units,  the  cavity  is  located  in  the  unit  square  [0,1] 
x  [0,1]  in  the  xy  plane.  The  hot  boundary  is  at  x  =  0,  the  cold  boundary  is  at  x  =  1,  and  the  top  and 
the  bottom  are  adiabatic.  The  x  and  y  components  of  the  scaled  velocities  are  denoted  by  u  and  v;  p 
denotes  the  scaled  difference  of  the  total  pressure  from  the  hydrostatic  pressure.  The  non- 
dimensional  conservative  equations  for  mass,  momentum,  and  energy  take  the  following  form: 


ux+vy  =  0  (2.1) 

-  (uu)x  -  (vu)y  +  Priu^+Uyy)  -px  =  0  (2.2) 

-  («v)x  -  (vv)y  +  Priya+Vyy)  -  py  +  RaPrT  =  0  (2.3) 

-  {uT)x  -  (yDy  +  T„  +  Tyy  =  0  (2.4) 


where  Pr  =  v/k  denotes  the  Prandtl  number  and  Ra  -  g(iL3(7j-7^)/VK  denotes  the  Rayleigh  number. 
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Here  g  is  the  gravitational  acceleration,  {1  the  coefficient  of  volumetric  expansion,  and  v  the  kinematic 
viscosity. 

These  equations  are  discretized  using  a  hybrid  finite-differencing  scheme  [1],  which  employs 
second-order  central  differences  on  the  convection  and  diffusion  terms  when  the  local  cell  Reynolds 
number  is  less  than  two.  However,  when  the  local  cell  Reynolds  number  is  greater  than  two,  the 
scheme  modifies  the  convective  differencing  procedure  to  a  donor  cell  (upwind)  formulation  and 
presumes  that  the  diffusion  flux  at  the  cell  interfaces  is  small  by  comparison  to  the  convection  flux 
and  thus  can  be  ignored.  This  scheme  provides  reasonable  accuracy  for  sufficiently  small  mesh  sizes 
while  being  stable  (i.e.,  /i -elliptic)  on  the  coarse  grids. 

A  standard  staggered  mesh  is  overlaid  on  the  domain.  The  velocities  are  associated  with  the  cell 
faces,  and  the  pressures  and  temperatures  arc  associated  with  the  cell  centers.  The  mesh  is  uniform 
with  cell  dimensions  &t  and  Sy.  Note  the  border  of  fictitious  cells  and  the  placement  of  the  tangen¬ 
tial  velocity  components  at  the  domain  boundary.  If  we  consider  the  (ij)-lh  cell,  then  the  pressure 
associated  with  the  cell  center  is  denoted  by  p^,  the  x  component  of  the  velocity  associated  with  the 
center  of  the  right-hand  face  is  denoted  by  and  the  y  component  of  the  velocity  associated  with 
the  top  face  is  denoted  by  v,^.  The  resulting  finite-difference  equations  can  be  written  in  the  fol¬ 
lowing  form: 

1  j  +  (2.5) 


+  ( Pij-PMJ)1** 


+  +  \RaPr(?>j +  Tij*  i) 

(u)+Vi>,-ui_Wj)/Sx  +  ( Vvi-v^viVSy  =  0  (2.7) 

ATJij  =  AlTijri  +  AjTy.  i  +  AtJmj  +  AtwT,.u  (2.8) 

The  coefficients  are  defined  as  follows.  For  $  =  u,  v,  and  T,  we  have 

A*  =  A*  +  A?  +  Af  +  A*w  (2.9) 

At  =  max(IC*  !,£>♦)  +  C*  (2.10) 

A*  =  max(IC*l,D*)  -  C*  (2.1 1) 

Aj  =  max(IC*  l,Dj- )  +  C*.  (2.12) 

A*  =  max(ICj,l,D*.)  -  C*, ,  (2. 1 3) 


where  the  differential  form  of  the  coefficients  is  used  to  give  the  correct  scalings  across  the  grids. 
Thus  we  have 
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c“x-  =  (ui¥V>J+ui_uJ)t4& c  =  Prlbx? 

(2.14) 

=  (u i+vij+a^/^Af&r;  D*.  =  Pr/bx* 

(2.15) 

C*-  =  (vI>H+vi+1/.i4)/48r.  D“y.  =  Pr/Sy2 

(2.16) 

Cy  =  (Vi^+v^vilMSy.  Of  =  Pr/Sy1 

(2.17) 

C'r  =  (Mi-vij+Ui.  D*.  =  Pr/5*2 

(2.18) 

=  (u,+Vi.,+«H.V4^tV45r,  Dvx,  -  Pr/bx1 

(2.19) 

Cy-  -  (Vi^+vi<hV^45y;  Dvy.  =  Pr/bf 

(2.20) 

C}  =  (v^H+vJ>J/28y;  Dyt  =  Prlby2 

(2.21) 

CTX-  =  u^Jlbx,  D[.  =  1/5*2 

(2.22) 

Cl  =  u^Jlbx.  Dl  =  1/Sx2 

(2.23) 

CTy-  =  vI>w/28y;  Dry  =  1/Sy2 

(2.24) 

4  *  v1>w/25y;  DTy.  =  1/5/  . 

(2.25) 

The  cavity  is  assumed  to  be  solid,  and  no  slip  conditions  are  assumed  to  prevail;  thus  the  normal  and 
tangential  components  of  the  velocity  are  set  to  zero  at  the  boundary.  The  temperature  at  the  left- 
hand  wall  has  the  value  one,  while  the  temperature  at  the  right-hand  wall  has  the  value  zero.  The 
adiabatic  walls  imply  that  the  normal  derivative  of  the  temperature  is  zero  at  these  walls.  Note  that 
the  tangential  velocities  in  the  border  cells  are  associated  with  the  walls,  and  the  temperatures  in  the 
border  cells  by  the  hot  and  cold  walls  are  also  associated  with  these  walls,  as  indicated  in  Figure  1. 
A  minor  modification  to  the  diffusion  terms  in  the  energy  equation  allows  the  temperature  boundary 
conditions  to  be  modelled  without  loss  of  accuracy. 


3.  BASIC  MULTIGRID  TECHNIQUES 

The  finite  difference  equations  derived  in  the  preceding  section  are  solved  by  a  multigrid  technique. 
For  a  complete  review  of  multigrid  techniques  with  applications  to  fluid  dynamics,  articles  by  Brandt 
[2]  and  Brandt  and  Dinar  [4]  may  be  consulted.  An  introduction  to  the  subject  may  be  found  in  a 
review  by  Stuben  and  Trottenberg  [3]. 

The  basic  multigrid  technique  used  in  this  application  is  the  FAS  (Full  Approximation  Storage) 
method  which  is  fully  discussed  in  [2],  [3],  and  [4],  The  basic  approach  can  be  described  as  follows. 
We  define  a  series  of  uniform  grids  with  spacing  hk  =  A*_j/2  for  k=\,2,.-M-  In  addition,  we  have  a 
set  of  grid  transfer  operators  and  /*_|,  where  the  first  two  operators  map  grid  functions 

defined  on  grid  k  to  functions  defined  on  grid  (A— 1)  (restriction)  and  the  last  operator  transfers  func¬ 
tions  defined  on  grid  k-l  to  functions  defined  on  grid  k  (interpolation).  Starting  on  the  finest  grid 
k  =  M  and  setting/*  =  F*,  we  wish  to  solve 
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Z^V)  =/  •  (3.1) 

Relaxation  iterations  are  performed  generating  a  grid  function  wk.  A  transfer  is  then  made  to  the  next 
coarser  grid,  k- 1,  where  the  following  problem  is  posed: 

_ /-l  a  l^x(f-Lk(wk))  +  .  (3.2) 

Relaxation  iterations  are  used  to  generate  a  grid  function  w*"1.  At  this  point  a  decision  is  made 
whether  to  restrict  to  the  next  coarser  grid  and  repeat  the  above  process  or  to  interpolate  back  to  the 
fc-th  grid  and  generate  a  new  approximation  to  w*  with  the  expression 

*£„=**  +  /jU^-ZJ-V) .  (3.3) 

An  FAS  method  is  not  completely  specified  until  one  defines  a  strategy  concerning  when  and  in  what 
direction  to  transfer  from  one  grid  to  another.  A  strategy  based  on  smoothing  rates  and  convergence 
is  usually  referred  to  as  adaptive  FAS,  while  a  strategy  based  on  a  fixed  cyclic  pattern  of  grid 
transfers  and  a  fixed  number  of  relaxation  iterations  on  each  grid  is  usually  referred  to  as  cyclic  FAS, 
with  a  prefix  specifying  the  type  of  cycle,  e.g.,  V,  IV,  F  [2],  Our  study  is  concerned  with  adaptive 
and  fixed-cycle  FAS  methods. 


4.  MULTIGRID  STRATEGIES 

In  this  section  we  describe  the  adaptive  strategy  and  the  W-cycle  strategy  used  in  this  study.  Both 
strategies  are  described  in  Brandt  [2]  and  have  been  used  in  many  different  investigations. 

4.1.  Adaptive  FAS 

The  particular  implementation  of  the  adaptive  FAS  algorithm  used  in  this  study  is  essentially  the 
same  as  that  described  by  Brandt  [2].  The  process  is  initiated  on  the  coarsest  grid  (grid  number  1) 
where  the  solution  of  the  complete  nonlinear  problem  is  sought.  At  this  level,  Newton-type  iterations 
are  used,  and  the  resulting  linear  equations  are  solved  by  a  direct  method.  The  converged  solution  on 
this  grid  is  prolongated  to  the  next  finer  grid,  where  relaxation  sweeps  are  performed.  Since  the 
problem  is  nonlinear,  the  coefficients  are  evaluated  after  each  sweep.  If  the  smoothing  rate,  as  meas¬ 
ured  by  the  ratio  of  successive  norms  of  the  current  residual,  falls  below  a  given  threshold  value  t|,  a 
decision  is  made  to  transfer  to  the  coarser  mesh  k- 1.  The  residuals  are  transferred  to  grid  (£-1),  and 
one  solves  for  grid  functions  w*-1  which  are  approximations  to  /J_1vv*;  the  problem  on  grid  k-\  is 

L*-V-1  =/_1  =  .  (4.1) 

If  the  grid  function  Vv*-1  generated  at  this  level  is  satisfactory,  the  correction  to  w*  at  the  fc-th  level  is 
then 

=  **  +  (4.2) 

Note  that  it  is  the  correction  h'*-1  -  that  is  transferred,  not  the  grid  function  w'*-1.  Also,  the 
relaxation  sweeps  for  Equation  (4.1)  are  started  from  the  initial  grid  function  /*-1  w*. 

At  any  stage  there  is  a  current  finest  level  /;  and  when  the  convergence  tolerance  is  met  on  this 
level,  the  grid  function  is  prolongated  to  a  finer  level.  Thus  an  adaptive  FAS  process  is  nested  with 
many  visits  to  the  coarser  grids.  When  the  finest  level  (k  =  M)  is  solved  to  the  desired  accuracy,  the 
overall  solution  cycle  is  terminated.  Note  that  the  tolerance  level  on  any  grid  is  equal  to  the  origi¬ 
nally  prescribed  value  only  when  that  grid  is  the  current  finest  level  /.  However,  when  the  current 
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level  4  is  less  than  /,  the  tolerance  e  k  is  set  to 

e  *  =  8e*fi  •  (4.3) 

where  eM  is  a  norm  of  the  residual  on  grid  4+1,  and  typically  8  =  0.2. 

When  restricting  from  grid  k  to  4-1,  the  coefficients  as  defined  in  Equations  (2.9M2.13) 

are  initially  generated  from  the  restricted  grid  function  /*~!vv*.  For  succeeding  iterations,  the 
coefficients  are  generated  from  the  latest  values  w*"1. 

42.  W-Cycles 

The  second  type  of  multigrid  algorithms  used  in  this  study  is  based  on  the  use  of  W-cycles  (cf.  [2] 
and  [3]).  Cyclic  algorithms  are  based  on  a  strategy  of  cycling  through  the  grids  in  a  specified  pattern 
while  performing  a  given  number  of  smoothing  iterations  on  each  visit  to  each  grid.  The  visitation 
pattern  of  W-cycles  can  be  described  recursively  as  follows.  For  M  =  2,  we  start  with  relaxation 
iterations  on  the  first  grid  (M  =  2),  restrict  to  the  coarser  grid  M-l  =  1,  perform  a  direct  solution  or 
relaxation  solution,  and  then  prolongate  to  the  finest  grid  (M  =  2)  for  further  relaxations.  Let  this 
cycle  be  denoted  by  W(2).  In  general,  if  M  is  given  and  W(M- 1)  is  defined,  we  generate  W(M)  as 
follows.  Starting  on  grid  M,  we  perform  relaxation  iterations  and  then  restrict  to  grid  M-l.  Next  we 
perform  two  W(M-1)  cycles  in  succession,  prolongate  to  grid  M,  and  finish  with  relaxation  iterations. 

To  complete  the  description  of  a  W(M)  cycle,  we  need  to  specify  the  number  of  relaxation  itera¬ 
tions  performed  on  each  grid.  In  this  study,  we  specified  a  W(M)-cycle  by  three  parameters  vc,Vp,vr, 
where  vc  specifies  the  number  of  Newton  iterations  performed  on  the  coarsest  grid  (4  =  1);  for  k  >  1, 
vr  specifies  the  number  of  relaxation  iterations  performed  on  a  grid  k  when  that  grid  is  reached  by  a 
restriction  from  the  grid  (4+1);  and  for  4  >  1,  vp  specifies  the  number  of  relaxation  iterations  per¬ 
formed  on  a  grid  4  when  that  grid  is  reached  by  a  prolongation  from  grid  (4-1).  At  local  peaks  in 
the  W(M)- cycle  (a  local  peak  occurs  when  a  grid  is  reached  by  prolongation  and  is  followed  by  a  res¬ 
triction),  the  number  of  relaxation  iterations  is  taken  to  be  V  =  max(vnvp).  To  generate  the  initial 
grid  function  defined  on  the  finest  grid,  we  use  a  simple  starting  procedure  consisting  of  performing  a 
specified  number  v“  of  Newton  iterations  on  the  coarsest  grid  (4  =  1)  and  successively  prolongating 
to  the  next  finest  grid  and  performing  vp  relaxation  iterations,  repeating  this  process  until  the  finest 
grid  is  reached,  at  which  point  the  W-cycle  starts.  The  number  of  iterations  on  the  finest  grid  is 
taken  to  be  V  =  vr  +  vp  where  vp  iterations  come  from  the  previous  W-cycle  and  vr  iterations  arise 
from  the  current  cycle.  To  avoid  excessive  iterations,  we  make  the  following  test  On  any  intermedi¬ 
ate  grid  4  <  M,  the  relaxation  iterations  are  terminated  after  one  additional  iteration  when 

e*  <  5e  m  .  (4.4) 

where  ek  is  a  norm  of  the  current  residual,  8  =  0.001,  and  e  M  is  the  error  tolerance  on  the  finest  grid. 
The  W-cycles  are  repeated  until  convergence  is  achieved  on  the  finest  grid. 


5.  RELAXATION  TECHNIQUES 

The  choice  of  an  efficient  relaxation  (smoothing)  operator  is  of  primary  importance  for  the  success  of 
the  multigrid  technique.  The  choice  of  a  relaxation  procedure  is  somewhat  problem  dependent,  and 
there  is  a  tradeoff  between  a  robust  technique  with  a  larger  operation  count  and  a  less  robust  but 
simpler  technique  with  a  lower  operation  count.  Of  course,  the  primary  objective  in  the  design  of  a 
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relaxation  procedure  is  to  achieve  the  best  possible  smoothing  rate. 

In  this  study,  the  relaxation  technique  is  a  modification  of  the  procedure  introduced  by  Vanka 
[5],  The  temperature,  momentum,  and  continuity  equations  are  relaxed  in  a  coupled  manner.  In  this 
scheme  the  temperature,  four  velocities,  and  one  pressure  associated  with  one  finite-difference  cell  are 
simultaneously  updated  by  solving  a  5x5  set  of  equations  with  special  structure.  Thus  the  velocities 
on  all  four  sides  of  a  cell  are  updated  together.  This  type  of  procedure  is  referred  to  as  a  symmetri¬ 
cal  coupled  Gauss-Seidel  (SCGS)  procedure.  The  details  of  the  procedure  when  applied  to  the 
natural  convection  problem  are  as  follows. 

For  any  given  grid  level  k,  consider  a  staggered  mesh  centered  at  cell  (ij),  which  we  take  to  be 
an  interior  cell.  We  are  given  a  set  of  grid  functions  ?/,,  U^j,  Vij+lA,  Pijt  and  a  set  of  right-hand 
side  grid  functions  I  which  are  generated  from  residual  and  variable  transfers  as 

indicated  by  the  right-hand  side  of  Equation  (4.1).  The  variables  fy,  f/l+Wy,...  are  used  to  generate 
the  finite  difference  coefficients  (A*),(4j),...  as  specified  in  Equations  (2.9)-(2.13).  With  these 
coefficients  defined  over  the  entire  mesh,  our  task  is  to  solve  Equation  (4.1).  We  write  this  equation 
in  block  form  as  follows.  We  order  the  mesh  cells  lexicographically,  and  with  each  mesh  cell  (ij) 
(cf.  Figure  4)  we  group  the  following  set  of  six  variables  as  a  unit  to  determine  the  block  structure: 

(Tij,U^ylj,Ui+iAj,Vij_U’Vij+'AJ>ij)  •  (5.1) 

With  this  blocking  and  ordering  of  the  mesh  cells.  Equation  (4.1)  has  the  following  form: 

AX  =  S,  (5.2) 

where 

A  =  D-L-U  ,  (5.3) 

D  is  the  block  diagonal  matrix  found  from  the  grouping  of  the  six  variables  in  the  (ij)- th  cell,  and 
L,U  are  block  lower,  upper  triangular  matrices  relative  to  this  ordering.  The  particular  form  of  relax¬ 
ation  used  in  this  study  is  motivated  by  the  following  considerations.  Standard  block  Gauss-Seidel 
relaxation  applied  to  Equation  (5.2)  would  take  the  following  form: 

Di  =  LX01  +  UX^0)  +  S  (5.4a) 

X0*  =  wX  +  (1  -w)X<0) ,  (5.4b) 

where  X<0)  is  some  initial  estimate,  as  is  a  given  parameter,  and  is  the  new  estimate  generated  by 
the  procedure.  This  relaxation  procedure  can  be  written  in  the  following  form  by  setting 

R  =  S  +  U6X)  +  UX<-0)  -  DX<0)  =  S  -  (D^-LtfV-UX™)  (5.5) 

and  then  observing  that 

D$-X®*)  =  R  .  (5.6) 

Combining  this  result  with  5.4b,  we  find 

—  D(X(I)-X<0))  =  R  .  (5.7) 

w 

Equation  (5.7)  is  the  basis  for  the  relaxation  procedure  used  in  this  study.  We  have  modified  this 
procedure  by  using  the  factor  1/w  only  on  the  diagonal  elements  of  D  rather  than  on  all  the  elements 
of  D. 

As  indicated  in  Equation  (5.7),  we  will  solve  for  the  corrections;  thus,  for  example,  we  write 
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4?)»7 f  +  Tij  (5.8) 

with  similar  expressions  for  the  other  variables.  In  Equation  (5.5),  set 

Q  =  DX*®  -  US"  -  UXfP) .  (5.9) 

We  defined  the  following  quantities  associated  wtih  the  (ijyth  cell: 

qJj - (&ijf  -  -  (*lv$i  -  utotfUj  -  vMJ-i  (5.io) 

QUj  =  ~  U&m/Wv  -  (5.11) 

-  -  (^W'Skn  " 

QUj  =  tj  -  (4W£Wu  -  (5-12) 

-  WW'ttw  -  Wawi/^Vi  + 

0/+vi  =  (4),>*V$W  -  (^.VwV'SUw  -  (towtfh  (5.13) 


-  0CWV#w  -  (Oy^**  -  F^X  +  ~  *aM$t 


fly-*  *  (de*),>WV<®  M  (5.14) 


0*0.  (5.15) 

Then  form  R  by  setting 

RI^ST-QJj  (5.16a) 

=  S&Vij  ~  CttVSj  (5.16b) 

=  tfjtH  ~  Qijtu  (5.16c) 

0  =  5T>.  (5.16d) 


At  this  juncture  we  have  calculated  the  right-hand  side  for  Equation  (5.7).  As  mentioned  earlier,  we 
have  modified  the  relaxation  procedure  described  by  Equation  (5.7).  Actually,  we  have  modified  the 
procedure  in  two  ways: 


(i)  The  factor  lAv  will  not  multiply  the  entire  6x6  block  matrix,  but  instead  just  the  diagonal 
entries. 


(ii)  Since  the  problem  is  nonlinear,  a  local  approximation  is  made  to  the  Jacobian  to  bring  in  the 
effect  of  velocity  on  the  temperature.  The  effect  is  to  modify  the  block  diagonal  matrix  D  in 
Equation  (5.7). 
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Thus  Equation  (5.4)  is  replaced  by  the  nonlinear  Gauss-Scidel  relaxation  scheme: 

D(X)X  =  +  I/(.T<(V(0)  +  S  (5.17a) 

X0*  =  wJ?  +  (l-w)^0' .  (5.17b) 

Consider  the  (i,/)-th  cell,  and  let  «t>  denote  the  group  of  six  variables  defined  in  Equation  (5.1)  which 
are  associated  with  that  cell.  Then  for  this  cell  we  wish  to  solve  the  vector  equation  (of  dimension  6) 

D(<M>  =  F  ,  (5.18) 

where  F  is  the  right-hand  side  of  (5.17a)  restricted  to  the  (ij)-th  cell,  and  D(<f>)  is  the  6x6  matrix 
associated  with  this  cell.  So  for  this  cell,  the  task  is  to  solve  the  vector  equation 

G(<f>)  =  F  ~  D((f>)d>  =  0  .  (5.19) 

Using  Newton’s  method  with  as  the  initial  estimate  would  lead  to  the  linear  system 

HV  =  G(<D(0))  =  F  -  D(d>(0W0)  =  R  ■  (5.20) 

Note  that  R  is  that  portion  of  the  vector  R  appearing  in  Equation  (5.7)  which  is  associated  with  the 
0V)-th  cell.  Here  H  is  the  negative  of  the  Jacobian  of  G(d>)  given  by 

=  Dpq( (D(0>)  +  .  (5.21) 

s=  1 

The  simplest  approximation  to  HM  is  Dw(d>(0))  (frozen  coefficient  approximation).  In  this  study,  we 
have  incorporated  some  of  the  terms  from  the  summation  appearing  in  (5.21). 

With  the  variables  associated  with  the  cell  grouped  as  indicated  in  (5.1),  the  6x6  matrix 
Dpq  which  is  formed  from  the  frozen  finite-difference  coefficients  is  defined  by  the  following  expres¬ 
sion: 


£>(4>(0))  = 


'  (.ATc)ii 

0 

0 

0 

0 

0 

0 

(A3«w 

0 

0 

0 

1/&* 

0 

0 

Wc)i+V4  j 

0 

0 

-1/Sx 

'ARaPr 

0 

0 

(Ac>ij-'A 

0 

l/8y 

'ARaPr 

0 

0 

0 

('4c)«y+vs 

-l/8y 

0 

-1/8* 

1/8* 

-1% 

I'Sy 

0 

<P<°>  =  (7 •  (5.23) 

and  the  coefficients  Atc/luc,...  are  evaluated  using  <D(0). 

To  form  the  Hpq  used  in  this  study,  we  have  used  only  the  additional  kims  in  Equation  (5.21) 
that  give  the  velocity  contribution  in  the  temperature  equation.  Thus  we  set  p  =  1  and  observe  that 

DIs(^°>)  =  0,  5=2,3 . 6  .  (5.24) 


Hence 


Hiq  =  DXq  +  <tf>  ■—  Du(&\  <7=1,2 . 6 


Recall  that  <b\0)  =  if*,  Dn(<b>{0y)  =  (Aj)ip  and  from  Equation  (2.9) 


(5.25) 
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ATC  =  Al  +  Al  +  A]  +  Al .  (5.26) 

Here  it  is  understood  that  we  are  dealing  with  the  (ij)-th  cell,  so  this  subscript  is  omitted  for  simpli¬ 
city.  The  coefficients  ATe,  Al„  a],  ATn  are  defined  by  Equations  (2.10)-(2.13)  and  Equations  (2.22)- 
(2.25).  Consider,  for  example. 


I7m  Dn(^(0>)  =  {A%  =  (#*  , 


3®P 


ir-Vlj 


dUi. 


since  only  Ajw  depends  on  U^j.  From  the  definition  of  aZ  we  have 

AZ  =  max(ICf.l^)J-)  +  CZ 


with 


Define 


Then 


rT  _  nT  -  1 

2&x  '  & 

ox-  =  max(0,sign(Ui_,Aj)) 


if '^5x^2 


7£ 

2&x 


ifl(/^5*<2 


In  a  similar  manner,  we  define 


Then,  we  find 


Wn  =  if  irf—  (tf)  =  < 


dU, 


i+Vij 


ox.  if  It/^&r  >  2 


V-V4 


-1 

25x 

1 


else 


±  ”,  ir  2 


*15  = 


y  3^*14 


-41  =  1 


l25y 

5y 

-1 

U5y 


else 

<V  if  IV)>V4l8y>2 
else 


(5.27) 


(5.28) 


(5.29) 


(5.30) 


(5.31) 


<V  =  max(0,sign(-(/i+V4^)) , 

(5.32a) 

ay-  =  max(0,sign(V,v_V4)) , 

(5.32b) 

oy.  =  max(0,sign(V1>Vi)) . 

(5.32c) 

(5.33) 


(5.34) 


(5.35) 
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tf16  =  0  (5.36) 

If  we  use  the  factor  1/w  only  on  the  diagonals  and  use  the  elements  HXq  as  defined  above,  the  6x6 
matrix  approximation  to  the  matrix  H  which  appears  in  Equation  (5.20)  has  the  following  form: 


Hx2 

W,3 

Wl4 

*.s 

0 

0 

0 

0 

1 

8x 

0 

“OOifVi,/ 

~(Ac)mj 

0 

0 

-1 

8x 

V. iRaPr 

0 

0 

-OOv-v* 

J_ 

by 

'ARaPr 

0 

0 

-1 

8y 

0 

-1 

1 

-1 

_1_ 

0 

8x 

8x 

8y 

8y 

For  simplicity,  the  four  elements  ( >  GOh-vv  GOv-v*  311(1  O'Oy+w  are  neglected. 
Recall  that  the  limiting  coupling  for  this  problem  is  the  temperature-velocity,  and  omitting  these 
toms  does  not  affect  this.  Effectively  we  are  decomposing  H  =  H0+fi,  where  H  contains  the 
neglected  terms,  and  then  writing  (5.20)  in  the  form  HQ<&  =  HO'+R  and  performing  our  iteration 
with  the  initial  connection  O'  =  0.  In  each  mass  control  volume  the  task  is  to  solve 


H0V  =  R  .  (5.38) 

The  cells  are  swept  over  in  a  lexicographic  ordering,  which  means  that  the  interior  velocity  com¬ 
ponents  are  updated  twice.  The  increased  rate  of  convergence  compensates  for  this  extra  arithmetic, 
and  somewhat  greater  robustness  is  achieved.  Set 


(5.39a) 

£r  =  (0,0,  'ARaPr, 'ARaPr, 0) 

(5.39b) 

a  =  -(A[),y 
w 

(5.39c) 

(5.39d) 

=  (O^O^O^O^O^ 

(5.39e) 

/=«! 

(5.390 

Et  =  (R^R^sJif) 


(5.39g) 
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B  = 


0 


0 

0 

0 

-1 

5x 


1 


0  0 

o  o 

o  0 

o 


0  >***  =& 


+i_ 

Sx 

^1_ 

&x 

+1_ 

By 

By 


+1_ 

Sx 


-1 

6y 


—  — i-  0 


+1_ 

By 


f 

«  lT 

X 

/ 

Li  *J 

E 

•  ' 

X 

f  •N 

P  £ 

f 

r  P 

< 

E 

Then  Equation  (5.38)  has  the  bordered  matrix  form 


This  equation  implies 


where  the  elements  of  the  inverse  are  given  by 

p_l  =  a-iV'i 

£=-P B-'i 

«T  =  -P4rzr' 

P  =  5_,+pS"'^7B‘1 


Thus,  if  we  define  i_  and  y  by 


Bl=E 


(5.40) 


(5.41) 

(5.42) 

(5.43a) 

(5.43b) 

(5.43c) 

(5.43d) 

(5.44a) 


the  system  (5.41)  is  solved  by 


BX=  i. 


P  -  (a-iTy)~l  , 


(5.44b) 

(5.45a) 


T=P <f~ZTz). 


(5.45b) 


H£  =  L-<X  ■  (5.45c) 

and  therefore  the  correction  vector  for  the  0V>th  cell  is 

<*>'=  [j]  •  (5.46) 

In  this  study,  we  scaled  the  vector  in  Equation  (5.39a)  with  a  specified  parameter,  and  the 
parameter  w  was  always  used  as  an  under-relaxation  factor. 

As  noted  earlier,  the  block  Gauss-Seidel  type  scheme  described  is  an  adaptation  of  the  SCGS 
scheme  introduced  by  Vanka  [5].  A  further  discussion  of  that  scheme  may  be  found  in  that  refer¬ 
ence.  It  should  be  noted  that  SCGS-type  schemes  differ  substantially  from  Brandt’s  Distributive 
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Gauss-Seidel  (DGS)  scheme.  The  SCGS-type  schemes  employ  a  simultaneous  update  of  all  the  vari¬ 
ables;  whereas  DGS  is  a  decoupled  procedure  which  would  correct  all  the  momentum  equations  first 
and  then  go  back  to  recorrect  the  velocities,  pressure,  and  temperature  (o  satisfy  the  remaining  equa¬ 
tions. 


6.  RESTRICTION  AND  PROLONGATION 

Recall  that  restriction  procedures  are  used  for  transferring  fine  grid  values  to  a  coarse  grid;  thus  the 
operation  of  transferring  from  grid  k  to  grid  k- 1  is  denoted  by  In  this  study,  the  variables  are 
restricted  in  the  same  manner  as  the  residual;  hence  /J"1  =  /*"‘  in  Equations  (3.3)  and  (3.4).  The  pro¬ 
longation  operator  /*_i  is  used  to  transfer  variables  from  a  coarse  grid  to  a  fine  grid  and  generally 
involves  an  interpolation  procedure. 

In  this  study,  the  restriction  operators  are  defined  by  the  simplest  average  of  nearby  values.  Let 
( icjc )  and  (ifjf)  denote  coarse  and  fine  grid  indices  corresponding  to  grid  k- 1  and  grid  k,  respec¬ 
tively.  In  this  context,  let  u^j  be  referred  to  as  and  u^j  as  j  with  similar  notation  for  v. 
Then  if=  2(ic)-l,  jf  =  2(jc)-l,  and 

Aicjc)  =  +  i/WJf-l)]  .  (6.1a) 

Aicjc)  =  +  v'Oy-ljA]  .  (6.1b) 

pMcjc) = + fSof-w + tJm-D + //oy-i^-i)i .  (6.10 

and 

nicjc) = (-J-)[7 +  7%#-d  +  T^y-i^-i)]  (6.id) 

Temperature  and  continuity  equation  residuals  are  associated  with  mesh  centers,  so  that  boundary 
values  do  not  occur  for  these  residuals.  Momentum  residuals  are  zero  on  the  boundary  since  we 
have  no  slip  and  solid  walls.  Thus  there  is  no  fine-to-coarse  transfer  of  residuals  associated  with  the 
boundary.  The  restriction  of  boundary  values  of  each  variable  requires  separate  consideration.  The 
velocities  are  specified  on  the  boundary,  so  no  fine-to-coarse  transfer  is  needed  for  boundary  veloci¬ 
ties.  The  pressures  are  associated  with  cell  centers,  and  no  boundary  conditions  are  imposed;  thus  no 
fine-to-coarse  boundary  transfer  is  needed  for  the  pressure.  On  the  walls  when  the  temperature  is 
specified,  no  transfer  is  needed,  and  on  the  adiabatic  walls  the  temperature  in  the  bonier  cells  is  set 
equal  to  the  temperature  in  the  adjacent  cell.  Note  that  in  this  study  none  of  the  restriction  operators 
is  of  the  full  weighting  type  (cf.  [2]). 

A  crucial  consEquence  of  Equation  (6.1c),  which  is  also  used  to  restrict  the  continuity  residuals, 
is  that  the  continuity  residual  for  a  coarse-grid  mass  control  volume  is  proportional  to  the  sum  of  the 
fine-grid  continuity  residuals  contained  in  the  coarse-grid  control  volume.  Thus  satisfaction  of  the 
discrete  analogue  of  (2.1)  on  the  fine  grid  forces  satisfaction  of  this  condition  on  all  grids.  This  is 
necessary  for  a  solution  of  the  coarse  grid  equations  to  exist.  (Recall  that  the  operators  are  singular.) 

The  coan:>to-fine  (prolongation)  operators  /*_t  are  based  on  bilinear  interpolation.  Thus  the 
coarse-to-finc  transfer  of  u-velocity  values  is  defined  as  follows: 
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where 


Am = (-J-x3<4  +  ud 

(6.2a) 

Am  i)  =  (-X4  +  34) 

(6.2b) 

Aif+ljf)  =  (})(3«!  +  4  +  34  +  4) 

(6.2c) 

Aif+iJf+1)  =  (jX4  +  34  +  4  +  3  4) , 

(6.2d) 

4  =  i^O'cJc);  4  =  if(icjc+l); 

(6.3a) 

4  =  ue(ic+\Jc);  4  =  «c(ic+l Jc+1) . 

(6.3b) 

At  the  top  and  bottom  mesh  cells,  the  above  formulas  are  modified  to  reflect  the  fact  that  we  associ¬ 
ate  the  velocities  in  the  border  mesh  cells  with  the  wall  values.  The  fine-grid  v-velocities  are  defined 
by  analogous  expressions.  Since  the  pressure  and  the  temperature  values  are  associated  with  the 
mesh  cell  centers,  the  interpolation  formulas  have  a  different  weighting  from  the  velocities.  At  inte¬ 
rior  mesh  cells,  the  coarse-to-fine  temperature  transfers  are  given  as  follows: 


where 


Aifjf)  =  (-17X971  +  371  +  371  +  71) 

10 

(6.4a) 

7to+l)  =  (-7X371  +  71  +  971  +  3  r4) 

10 

(6.4b) 

Aif+m  =  (-7X371  +971+  71  +  371) 

10 

(6.4c) 

7%+ljf+l)  =  (-rrXTl  +  371  +  371  +  971)  . 

10 

(6.4d) 

71  =  r(icjc);  Tf  =  Tc(ic+\Jc)\ 

(6.5a) 

7i  =  r(icjc+\y,  7$  =  ro'c+ijc+i) . 

(6.5b) 

At  the  top  and  bottom  walls  a  zero-derivative  condition  holds;  thus,  for  example,  bilinear  interpola¬ 
tion  for  the  top  mesh  cells  would  give 

At m  =  ({)(37f  +  71) ,  (6.6a) 


7%+ljO  =  (|)(71  +  37|)  , 


(6.6b) 


where 

Tf  =  r«cjc);  7|  =  (ic+ljc) ,  (6.7) 


with  corresponding  expressions  for  the  bottom  mesh  cells.  At  the  left  and  right  walls,  the  tempera¬ 
ture  is  specified;  thus,  for  example,  bilinear  inteipolation  for  the  left  mesh  cells  would  give 
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=  (jX37t  +  372  +  73  +  74)  (6.8a) 

7%J^l)  =  (-|)(n  + 72+371  +  37*4),  (6.8b) 

where 

T\  =  T^UcJc)  =  boundary  value;  1\  =  7^(f*c+l  Jc);  (6.9a) 

71  =  T{icjc+\)  =  boundary  value;  Ti  =  Tc(ic+lJc+l)  (6.9b) 

with  the  corresponding  expressions  for  the  rightmost  mesh  cells.  The  interpolations  in  the  comer 
mesh  cells  are  handled  in  the  obvious  way.  The  pressure  is  treated  in  a  manner  similar  to  the  tem¬ 
perature  except  that  a  zero-derivative  condition  is  assumed  to  hold  on  all  boundaries.  This  approxi¬ 
mation  is  used  only  in  the  implementation  of  the  prolongation  operator,  since  the  continuity  equation 
is  satisfied  in  the  relaxation  phase,  pressure  boundary  conditions  are  unnecessary. 

Notice  that  the  prolongation  operator  which  appears  in  Equation  (3.4)  is  acting  on  what  is 
essentially  a  correction  to  the  solution  on  the  (k-l)-st  grid.  This  is  the  operator  we  have  been  dis¬ 
cussing  in  this  section.  In  the  present  study  we  have  used  the  same  prolongation  operator  for  the 
corrections  as  for  the  solutions. 

It  should  be  emphasized  that  in  an  FAS-type  algorithm,  the  values  on  the  coarse  grid  are  not 
directly  prolongated  (see  Equation  (3.4));  rather,  the  changes  from  previously  restricted  values  are 
prolongated.  That  is, 

=  *m  +  4-t5w*"1  (6-  10a) 

gw*'1  =  Vv*-'  -  (6-  10b) 

The  operators  defined  in  Equations  (6.1M6.9)  are  applied  to  6w'*_I. 


7.  COARSEST  GRID  SOLUTION 

The  relaxation  sweeps  described  in  Section  5  are  used  on  every  grid  level  except  the  coarsest  grid 
k-  1.  On  this  grid,  where  the  dimension  of  the  system  represented  in  Equation  (4.1)  is  small,  we 
solve  this  system  using  a  Newton-Raphson  type  procedure  combined  with  a  direct  solution  of  the 
resulting  linear  system.  The  matrix  for  the  linear  system  is  generated  from  the  Jacobian  of  the  opera¬ 
tor  appearing  in  Equation  (4.1)  on  the  coarsest  grid.  (As  usual,  the  pressure  non-uniqueness  is  elim¬ 
inated  by  defining  a  reference  pressure  at  one  point  This  does  not  affect  the  rate  of  convergence  of 
the  multigrid  cycle  since  it  is  applied  only  on  the  coarsest  grid.) 

We  will  illustrate  the  nature  of  the  Jacobian  in  the  case  of  the  temperature  equation.  Recall  from 
Section  2  that  the  finite  difference  equation  for  the  temperature  has  the  following  form  when  con¬ 
sidered  at  the  (ij)  mesh  cell: 

FTC  =  AtJJ-Tw )  +  Ar,(Tc-T,)  +  A](JC-TS)  +  Ath(Tc-TJ  =  Re  ,  (7.1) 


where 
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and 

Setting 


Tc  =  Tw  =  ni-lJ)\  T,  =  r(/+l  j); 

t,  =  t,  =  rov+i), 

AZ  =  (AZ)ij .  etc. 

Uc  =  U(i+l/iJ)-,  Uw=U(i-'AJ)-,  Ut  =  f/(i+3/2j); 


(7-2) 


(7.3) 


{/,  =  l/(t+V4^-l);  £/„  =  U(i+Vi^-1) 

with  similar  notation  for  the  v-velocities,  we  can  write  the  defining  relations  for  the  coefficients  as 
follows: 

Al  =  aZ(Uw)  =  max(l/6x2,ll/J/2&c)  +  UJ2& c ,  (7.4a) 

AZ  =  aZ(U,)  =  max(l/a*2,l(/cl/28x)  -  UJ2Sx  ,  (7.4b) 

A]  =  A](y,)  =  max(l/8/,IV,l/25y)  +  VJ2 8y  ,  (7.4c) 


vlj  =  AZ(VC)  =  maxO/S/.IV^Sy)  -  Vy25y  .  (7.4d) 

Thus  we  have  the  following  dependencies: 

FTe  =  F\TcJw,T'J„Tll,U„UcVl,Vc)  .  (7.5) 

The  Jacobian  is  generated  by  using  the  partial  derivatives  of  FT  with  respect  to  each  of  these  vari¬ 
ables.  In  the  case  of  the  temperature  dependence,  we  have,  for  example, 

jL-=AT+AT  +  AZ  +  ATH  =  ATc.  (7.6) 


In  the  case  of  velocity  dependence,  for  example  Uw,  we  have 


”  <7','r-)  iu^  °  (r‘~Tw)‘ 


/8x  if  Uw>0 
0  else 


if  SxiLU  <  2 


with  similar  expressions  corresponding  to  the  other  variables.  Then,  as  mentioned  earlier,  the  result¬ 
ing  linear  system  is  solved  by  a  direct  method. 


8.  RESULTS 

The  algorithm  described  in  Sections  3-7  has  been  applied  to  the  laminar  double-glazing  problem. 
Streamlines  and  isotherms  are  shown  in  Figures  2  and  3.  The  complex  flow  structures,  which  are 
introduced  by  the  nonlinear  coupling  of  the  equations,  can  be  seen  clearly.  Moreover,  the  fine  detail 
present  at  the  highest  Rayleigh  number  means  that  small  mesh  spacing  has  to  be  used  to  resolve  the 
boundary  layers.  The  present  version  of  our  code  uses  uniform  meshes.  This  is  not  the  most 
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FIGURE  3.  Contour  maps  of  the  temperature 
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efficient  form  to  apply  to  this  class  of  problem,  but  it  does  allow  demonstration  of  the  effectiveness 
of  the  algorithm  on  this  system  of  equations. 

First,  we  consider  the  accuracy  of  solution  method.  We  concentrate  on  the  case  with  the  highest 
Rayleigh  number,  which  proved  to  be  the  most  difficult  These  results  are  typical  Table  1  compares 
two  parameters  from  our  calculations  with  those  obtained  by  Kessler  and  Oertel  [11]  and  de  Vahl 
Davis  [10]  from  the  benchmarking  exercise  mentioned  earlier.  (These  solutions  were  judged  particu¬ 
larly  accurate  [12].)  The  parameters  are  the  maximum  horizontal  velocity  component  in  the  vertical 
mid-plane  (with  the  vertical  position  shown  in  the  second  row)  and  the  corresponding  quantities  in 
the  horizontal  mid-plane.  Clearly,  our  results  agree  closely  with  those  from  the  earlier  studies. 
These  quantities  are  those  demanded  in  the  original  statement  of  the  double-glazing  problem  and  are 
reasonably  sensitive  to  any  errors  in  the  solutions.  Thus  we  have  some  confidence  that  a  closer  com¬ 
parison  would  reveal  no  anomalies. 

Providing  some  sort  of  error  estimate  is  extremely  useful  for  all  numerical  solution  techniques. 
The  additional  information  available  with  multigrid  algorithms  facilitates  this  process.  The  simplest 
way  (assuming  an  exact  solution  is  not  known)  is  to  use  the  defect  (t*)  which  approximates  the  local 
truncation  error.  In  Figure  4  we  plot  for  each  grid  level  in  the  256x256  calculation.  On  the  finest 
grids  the  slope  shows  that  we  are  in  the  asymptotic  regime  and  that  the  errors  are  behaving  as  0(h). 
On  the  coarse  grids  the  errors  are  actually  increasing.  In  fact,  the  thickness  of  the  vertical  boundary 
layer  is  about  1/30,  so  the  discrete  problem  is  a  very  poor  approximation  to  the  continuous  one  for 
h  S  1/32.  To  obtain  estimates  of  the  actual  errors,  we  must  solve  some  extra  equations.  This  is 
beyond  the  scope  of  the  present  work. 

Table  2  shows  the  average  rate  of  convergence  and  the  times  per  cycle  for  the  various 
Rayleigh-number/Prandtl-number  combinations  for  F(2,2)  cycles.  This  is  a  conservative  cycling  stra¬ 
tegy,  but  it  appears  to  be  reasonably  efficient.  Moreover,  experiments  with  different  values  of  vr  and 
v,,  show  only  small  changes  in  total  CPU  time.  As  can  be  seen  from  Table  2,  it  proved  necessary  to 
introduce  some  under-relaxation  to  force  convergence  at  the  highest  Rayleigh  numbers  (i.e.,  the  prob¬ 
lems  with  largest  Frechet  derivative);  this  procedure  was  done  as  indicated  in  Equation  (5.40).  More¬ 
over,  for  the  Ra  =  106  case,  it  was  necessary  to  increase  the  size  of  the  coarsest  grid  from  2x2 
(which  was  adequate  for  the  lower  Rayleigh  numbers)  to  4x4.  This  situation  was  disappointing;  a 
parameter-free  algorithm  would  have  been  preferable.  The  convergence  data  displayed  in  Tables  2, 
3,  and  4  indicate  that  this  is  a  problem  associated  with  our  treatment  of  the  nonlinearities,  and  various 
techniques  are  being  considered  to  ameliorate  the  problem.  However,  it  seems  likely  that  more 
sophisticated  handling  of  the  nonlincarities  will  have  a  major  overhead  in  computing  time,  and  it  is 
not  clear  that  the  new  technique  will  prove  worthwhile. 

The  data  presented  in  Table  3  show  that  the  rate  of  convergence  is  sensitive  to  the  under- 
relaxation  factor  and  the  cycle  type.  The  behavior  of  the  algorithm  is  complex  because  of  the  high 
degree  of  nonlinearity  and  the  fact  that  hybrid  differencing  has  been  used.  This  difference  scheme 
uses  stable,  upwind-differencing  at  high  mesh-Reynolds  numbers  and  more  accurate,  central 
differencing  at  low  mesh-Reynolds  numbers.  Thus,  the  coarse  grid  operators  are  only  fair  approxi¬ 
mations  to  the  fine  grid  ones.  Moreover,  at  the  highest  Rayleigh  numbers  there  is  a  large  domain 
where  very  low  velocities  allow  central  differencing  even  on  the  coarse  grids.  The  accuracy  of  the 
coarse  grid  correction  is,  therefore,  somewhat  varied  and  affects  the  convergence  in  a  complex 
fashion.  The  amount  of  work  that  should  be  spent  on  the  different  grids  is  extremely  difficult  to 
predict  because  of  this.  Our  experiments  show  that  F-cycles  are  the  most  efficient.  It  is  clear  that 
V-cycles  do  not  provide  sufficiently  good  corrections.  W-cycles  ate  marginally  better  than  F-cycles, 
in  terms  of  convergence  rates  for  the  same  under-relaxation  parameter  (Table  3(b)),  although  the 
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TABLE  1.  Accuracy  of  results  -  Double  glazing  problem 


Maximum  horizontal  velocity  in  vertical  mid-plane  (with  location) 
Maximum  vertical  velocity  in  the  horizontal  mid-plane  (with  location) 
(Rayleigh  number:  106,  Prandtl  number.  0.71) 


Grid 

322 

642 

1282 

2562 

Benchmark  Results 

Um«(x=0.5) 

y  = 

66.18 

0.86 

65.43 

0.85 

64.99 

0.85 

64.88 

0.85 

64.63(1) 

0.850 

65.21(2) 

0.854 

WMW) 

X- 

202.4 

0.047 

221.6 

0.039 

217.9 

0.043 

220.8 

0.037 

219.36 

0.0379 

220.4 

0.039 

Note:  (1)  de  Vahl  Davis  [10] 

(2)  Kessler  and  Ocrtel  [11] 


Rayleigh  number  106,  Prandtl  number:  0.71; 
thickness  of  vertical  boundary  layer  approximately  1/30 


FIGURE  4.  Local  truncation  error  estimates  for  double-glazing  problem 
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TABLE  2(a).  Convergence  rate  as  a  function  of  Rayleigh  number 


(Prandtl  number  0.71,  F(2,2)  cycles,  64x64  grid) 


Ra 

1.0 

10 

102 

103 

104 

10s 

106 

to  =  1.0 
©  =  0.395 

0.084 

0.26 

0.085 

0.295 

0.099 

0.306 

0.091 

0.315 

0.352 

0.45 

TABLE  2(b).  Convergence  rate  as  a  function  of  Prandtl  number 

(Rayleigh  number.  104,  F(2,2)  cycles,  64x64  grid) 


Pr 

0.1 

0.5 

0.71 

5.0 

©  =  1.0 

0.47 

0.15 

0.12 

0.12 

©  =  0.395 

0.47 

0.32 

0.28 

0.32 

TABLE  3(a).  Rate  of  convergence  as  a  function  of  cycle  type  and  under-relaxation  factor 


(Rayleigh  number  106,  Prandtl  number:  0.71,  grid:  642) 
(Calculations  terminated  when  ||Residual||  <  10-4) 


F(2,2)  cycles 

V(2  2)  cycles 

W(2,2)  cycles 

© 

#  cycles 

© 

#  cycles 

© 

#  cycles 

0.3 

31 

0.15 

73 

0.30 

31 

0.35 

27 

0.17 

71 

0.31 

30 

0.36 

26 

0.18 

74 

0.32 

29 

0.37 

25 

raps 

72 

0.33 

28 

0.38 

24 

72 

0.34 

27 

0.39 

24 

■owl 

75 

0.345 

28 

0.395 

23 

0.40 

24 

0.4014 

26 
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TABLE  3(b).  Rate  of  convergence  as  a  function  of  cycle  type 

(Prandtl  number.  0.71,  grid:  642) 


Rayleigh  number  104.  ©  =  1.0 

Rayleigh  number  106,  ©  =  0.2 

F(2,2) 

0.124 

F(2.2) 

0.68 

V(2,2) 

0.427 

V(2,2) 

0.77 

W(2,2) 

0.106 

W(2.2) 

0.67 

benefits  are  slight  because  of  the  hybrid  differencing.  Even  in  this  case,  the  extra  arithmetic  in  W- 
cycles  makes  them  less  efficient.  Somewhat  surprisingly,  F-cycles  have  significantly  better  stability 
properties,  allowing  larger  under-rclaxation  parameters  and  better  convergence  rates. 

In  Table  4  we  show  the  rate  of  convergence  for  the  most  difficult  problem  {Ra  =  106)  for  a 
range  of  grid  sizes.  Linear  multigrid  theory  predicts  a  rate  independent  of  mesh  size,  and  we  see  the 
same  sort  of  behavior  here.  This  fact  is  very  encouraging:  we  are  observing  proper  multigrid 
behavior  and  rates  of  convergence  that  are  extremely  good  for  the  lower  Rayleigh  number  cases  and 
that  arc  still  reasonable  for  the  high  Rayleigh  number  case,  even  though  under-rclaxation  has  been 
introduced. 

TABLE  4.  Rate  of  convergence  as  a  function  of  grid  size 

(Prandtl  number:  0.71) 


KEMSZnVil 

Grid 

Rate  of 

Grid 

Rate  of 

size 

© 

convergence 

size 

© 

convergence 

322 

II 

0.5 

322 

1.0 

0.17 

642 

0.45 

642 

1.0 

1282 

0.367 

1282 

1.0 

2562 

0.395 

0.5 

25 62 

1.0 

0.077 

It  is  possible  that  the  initial  goal  of  finding  a  parameter-free  algorithm  that  would  be  highly 
efficient  over  a  very  wide  range  of  nonlincarities  was  overambitious.  Almost  all  computational  fluid 
dynamics  codes  employ  such  techniques,  and  the  multigrid  convergence  rates  that  have  been  achieved 
are  one  or  two  orders  of  magnitude  better  than  corresponding  single-grid  algorithms. 


9.  CONCLUSIONS 

A  novel  multigrid  algorithm  has  been  presented  for  buoyancy-induced  flows.  The  relaxation  scheme 
avoids  the  introduction  of  Brunt-Vasaila  oscillations  which  limit  the  performance  of  classical,  segre¬ 
gated  approaches.  The  new  method  appears  to  be  reasonably  efficient  and  robust,  converging  over  a 
range  of  physical  parameters  from  zero  initial  guess. 
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It  is  necessary  to  use  some  under-relaxation  at  the  highest  Rayleigh  numbers.  The  reasons  are 
being  investigated. 
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for  Interface  Problems 
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A  multigrid  method  is  presented  for  cell-centered  discretizations  of  elliptic  partial 
differential  equations.  The  method  works  both  for  smooth  and  strongly  discontinuous 
coefficients,  even  though,  in  contrast  with  earlier  works,  the  prolongation  and  restriction 
operators  do  not  depend  on  the  equation. 

1.  INTRODUCTION 


The  multigrid  method  to  be  presented  will  be  developed  for  the  following  equation: 


dx 


dy  v“  dy 


(x,y)  e  n  =  (0,1)  x  (0,1),  ^|an  =  g.  a  >  0. 


(1-1) 


The  coefficient  a(x,y)  is  not  continuous  everywhere.  This  precludes  application  of 
standard  multigrid  methods.  Alcouffe  et  al.  (1981),  Dendy  (1982),  Kettler  and 
Meijerink  (1981)  (see  also  Kettler  (1982))  have  developed  special  multigrid  methods  that 
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work  well  for  the  problem  considered  here.  In  these  methods  the  prolongation  and 
restriction  operators  depend  on  the  discrete  approximation  to  (1.1).  Until  now,  theoretical 
justification  is  lacking  and  seems  hard  to  come  by.  In  the  following,  a  multigrid  method 
is  proposed  for  (1.1)  that  also  works  in  practice,  is  simpler,  and  can  be  justified  theoret¬ 
ically.  The  difference  with  the  methods  just  mentioned  is,  that  prolongation  and 
restriction  are  not  problem-dependent,  and  that  grid  coarsening  is  done  cell-wise  rather 
than  point-wise.  What  this  means  will  be  made  clear  in  the  sequel. 

2.  FINITE  VOLUME  DISCRETIZATION 

For  convenience,  the  mesh  size  will  be  h  in  both  directions.  The  domain  fi  is  sub¬ 
divided  in  finite  volumes  or  cells,  which  are  squares  of  size  h,  with  centers  at  the  points 

nh  -  {(x,y):  x  -  Xj  -  (i  -  l/2)h,  y  *  yj  =  (j  -  l/2)h;  i,j  =  1,2 . n;  h  =  1/nj-.  (2.1) 

The  cell  with  center  at  (xj.yj)  is  denoted  by  Ojj,  and  ^jj  is  the  value  of  <t>  at  the  center. 
Often,  this  is  called  a  block-centered  or  cell -centered  grid.  Forward  and  backward 
divided  differences  in  x-  and  y-direction  are  defined  by 


Ax^ij  =  (^i+l,j  -  ^ij)A,  vx^ij  =  W ij  -  4i-i,j)/h, 


(2.2) 


and  similarly  for  Ay  and  Vy. 

For  completeness  we  briefly  review  the  elementary  aspects  of  finite  volume  dis¬ 
cretization  of  (1.1)  with  discontinuous  coefficient  a.  Eq.  (1.1)  is  integrated  over  the  finite 
volume  fljj.  With  the  Gauss  divergence  theorem  this  results  in 


"  any 


nrtdr 


p,a“a' 


L 


fdn 


h2fij. 


(2.3) 
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with  the  summation  convention  for  the  index  a. 
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Let  Sy  be  the  side  of  fly  with  outward  normal  in  the  xa-direction  (xj  «*  x,  X2  *  y).  Eq. 
(2.3)  can  be  rewritten  as 

-VaF*J«hfy,  (2.4) 

with  the  flux  Fa  defined  as 

Fa  -  J  «  aAdr-  (2.5) 

Sij 

We  discuss  the  approximation  of  FjJ;  f|J  is  treated  similarly.  fJj  is  approximated  as 
follows: 

Flj  -  haij  (*i  +  h/2,yj),  (2.6) 

where  ajj  is  the  average  of  a  over  fly.  The  approximation 

(xy  +  h/2,y jj)  ~  Ax^jj  (2.7) 

is  out  of  the  question,  since  ay  may  differ  strongly  between  adjacent  cells,  so  that  d<j>/dx 
may  have  large  jumps  at  cell  boundaries.  A  correct  approximation  is  obtained  as  follows. 
Point  of  departure  is  that  and  ad<f>/dx  are  continuous.  Denote  for  brevity 
$(xy  +  h/2,yy)  by  <fi*.  Then  we  approximate  FjJ  by 

Fjj  —  2ay(tf*  -  tfy)  =  2ai+1  j  (^i+1  j  -  A  (2.8) 

Elimination  of  (f> *  from  (2.8)  results  in 
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with 

wjj  *  2a}jaj+j  j/(ajj  +  aj+  j  j). 

Similarly,  we  obtain 
ij  y 

F2  *  hwjjAy^jj 

with 

y 

wij  ■  2ajjaij+j/(a;j  +  ajj+j). 

Substitution  of  (2.9)  and  (2.1 1)  in  (2.4)  results  in 

-(VXWXAX  +  VyWYAy)  <t>  *  f . 

It  is  easy  to  see  that  wx  and  wY  satisfy 

inf  (a)  <  wx,wy  <  sup  (a). 

The  Dirichlet  boundary  condition  is  implemented  as  follows.  Consider 
x  ■  0.  There  F*jJ  is  approximated  by  (cf.  (2.8)): 

F°lj^2a1j(^lj-gj). 

A  Neumann  boundary  condition  gives  F^-*  directly. 


(2.9) 

(2.10) 

(2.11) 

(2.12) 

(2.13) 

(2.14) 

the  side 

(2.15) 
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The  reader  is  assumed  to  be  familiar  with  multigrid  methods.  For  an  introduction,  see 
for  example  Hackbusch  and  Trottenberg  (1982),  Hackbusch(1985)  or  McCormick  (1987). 

Coarse  grids  are  constructed  cell-wise.  That  is,  coarser  grids  f^h^h’-  are 
obtained  by  successively  doubling  h  in  (2.1).  Hence,  each  coarse  cell  is  the  union  of  four 
finer  cells.  The  cell  centers  of  a  coarse  grid  do  not  belong  to  the  next  finer  grid.  This  is 
different  from  point-wise  coarsening,  where  coarse  grids  are  constructed  by  deleting 
grid  points,  so  that  coarse  grid  points  always  belong  to  a  finer  grid. 

The  grid  with  mesh  size  h  is  denoted  by  fljj,  and  $j,:  fij,  ->3R  is  the  corresponding 
set  of  grid  functions.  Elements  of  are  denoted  by 

In  this  section  the  choice  of  prolongation  and  restriction  operators 

Ph:  $2h  -  *h>  R2h;  *h  -  *2h 

is  discussed.  One  possibility  is 

K*2h)  2i,2j  =  (ph*2hhi-l,2j=  (ph^2h)  2i,2j- 1 

*  Cph^2h)  2i-l,2j-l  =  (31> 

A  possibility  for  R2h  is 

R2h  =  ph>  (3-2) 

with  superscript  *  denoting  the  adjoint.  With  the  inner  product 
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we  find  that  the  stencil  of  R-2h  defined  by  (3.1),  (3.2)  is 

M-iP.  ‘.1 


(3.4) 


where  [•]  denotes  the  stencil  of  the  corresponding  operator. 

Ph  and  R2h  interpolate  polynomials  exactly  of  degree  at  most  0.  Their  order  mp, 
mp  is  defined  to  be  the  maximum  degree  of  exactly  interpolated  polynomials  plus  1, 
hence  for  (3.1),  (3.2)  we  have 


mp  *  mp  =  1. 


(3.5) 


We  must  have 


mp  +  mp  >  2m 


(3.6) 


(Brandt  (1977),  Hackbusch  (1985)),  with  2m  the  order  of  the  differential  equation  to  be 
solved.  Hence,  (3.1),  (3.2)  are  not  right  for  (1.1).  See  Wesseling  (1987)  for  what  happens 
when  one  does  use  (3.1),  (3.2)  for  (1.1). 

A  restriction  with  mp  -  2  is  given  by 


CR2h] 


110  0' 
13  2  0 
0  2  3  1 
0  0  11. 


(3.7) 


At  the  boundaries,  (3.7)  has  to  be  modified.  For  a  Dirichlet  boundary  condition  we  obtain 
at  the  boundary  y  *  1  or  at  the  boundary  x  -  0: 


[R2h] 


0  0  0  0 
0  2  2  0 
0  2  3  1 
.0  0  11. 


(3.8) 


» 
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and  similarly  at  other  parts  of  the  boundary.  This  restriction  is  obtained  as  adjoint  of 
linear  interpolation.  For  simplicity,  (3.8)  is  also  used  in  the  case  of  Neumann  boundary 
conditions. 

Let  the  system  of  equations  to  be  solved  on  be  denoted  as 
Ah^h  =  fh  (3 .9) 

On  fl2h,  Ajj  is  approximated  by 


A2h  =  R2hAhph- 


(3.10) 


Let  A^  have  a  7 -point  stencil: 


CAh] 


*  *  0 

*  *  # 

0  *  . 


(3.11) 


Then  it  is  found  that  with  P^,  R-2h  given  by  (3.1)  and  (3.7-8),  A2h  as  given  by  (3.10) 

* 

also  has  a  7-point  stencil.  This  is  also  true  if  Pjj  is  given  by  (3.7-8)  and  R2h  by  (3.4). 

# 

However,  if  both  Pf,  and  R2h  are  given  by  (3.7-8)  then  the  stencil  of  A2h  is  larger  than 
that  of  Ajj.  Therefore  it  was  decided  to  choose  R2h  according  to  (3.7-8)  and  P^  according 
to  (3.1).  Note  that  we  have  mp  +  m^  =»  3,  which  suffices. 

It  is  easy  to  obtain  A2h  explicitly  from  (3.10),  with  the  choice  just  made  for  Pjj  and 
R2h-  I*  is  found  that  A2h  corresponds  to  the  following  discrete  equation  on  the  coarse 
grid  (cf.  eq.  (2.13)): 


-(VxwXAx  +  VywyAy)$  -  f , 


(3.12) 


where  quantities  belonging  to  the  coarse  grid  are  denoted  by  an  overbar.  We  find  the 


.  j  -  y 

following  simple  relation  for  w  ,w': 
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'?j  =  ~2  (w2i,2j  +  w2i,2j-l>>  =  \  (w$if2j  +  w^_lf2j).  (3.13) 


Hence,  in  this  case  construction  of  coarse  grid  matrices  by  (3.10)  (Galerkin  approxima¬ 
tion)  is  extremely  cheap.  Note  that  A2h  is  symmetric. 


4.  NUMERICAL  EXPERIMENTS 

The  multigrid  schedule  used  is  the  W-cycle  with  one  post-smoothing  iteration.  The 
smoothing  method  is  the  ILU-method  described  in  Wesseling  (1982,  1987). 

The  test  problems  are  the  interface  problems  sketched  in  fig.  4.1.  In  the  first 
problem  we  have  two  concentric  squares, 


Problem  3 

Figure  4.1.  Geometry  of  test  problems 

in  the  second  problem  the  inner  square  is  rotated  over  45°.  The  sides  of  the  outer  square 
have  length  1,  of  the  inner  cell  nh  in  problem  1,  and  nh/V2  in  problem  2.  In  the  inner 
square  we  have  a  =  aj  ■  0.333  *  10^,  in  the  outer  square  a  *  a£  =  2.  Problem  3  was 
suggested  by  Achi  Brandt.  The  cells  with  centers  at  x  =  (n  -  l/2)h  constitute  a  vertical 
isolating  strip  of  width  h,  where  the  value  of  the  diffusion  coefficient  is  a  =  aj  =  10'^®; 
outside  the  strip,  a  =  a2  =  2. 

We  solve  (1.1)  with  f  =  xy,  g  =  x^  +  y2(  starting  iterand  zero.  Twelve  iterations 
were  carried  out.  The  average  reduction  factor  p  is  defined  as 


o 

Problem  2 
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/>- «i> 

with  || .  ||  the  <2'norm.  r  the  residue  r  =  b*1  -  Ajj^1  on  the  finest  grid  (with  A^*1  «  b*1 
the  system  top  be  solved),  r°  the  initial  residue,  rm  the  final  residue,  and  m  the  number 
of  multigrid  iterations  carried  out.  The  following  table  gives  p  for  a  number  of  cases. 
Where  n  #  0  we  have  taken  the  worst  case  for  all  0  <  n  <  h"  1 .  The  last  column  is  for 
Neumann  boundary  conditions  along  x>0  and  y  -  0. 


Problem\h~J 

8 

16 

32 

64 

64 

1 

0 

.059 

0 

.077 

0 

.085 

0 

.091 

0 

.090 

1 

6 

.312 

10 

.362 

26 

.304 

58 

.290 

58 

.298 

2 

6 

.245 

10 

.300 

26 

.273 

58 

.237 

58 

.220 

3 

1 

.061 

10 

.074 

18 

.147 

34 

.299 

34 

.372 

Table  4.1.  n,  p  for  problems  1,  2  and  3. 


It  is  clear  that  multigrid  works  efficiently.  For  problems  1  and  2,  p  does  not 
increase  with  h.  With  n  +  0  p  is  larger  than  with  n  =  0  (Poisson  equation).  We  think  this 
is  due  to  the  fact  that  the  equations  in  the  inner  square  are  almost  uncoupled  f  rom  those 
outside  for  aj  »  a2,  so  that  we  almost  have  a  discretized  pure  Neumann  problem  for  the 
interior  square,  which  is  singular.  This  hypothesis  is  confirmed  by  the  fact  that  with  aj 
and  a2  interchanged  (aj  «  a2)  p  is  found  to  be  about  the  same  size  for  all  n,  including  0. 
For  problem  3,  p  increases  with  h  for  certain  locations  of  the  isolating  strip.  This  is 
thought  to  be  due  to  the  fact  that,  as  suggested  by  Achi  Brandt,  according  toeq.  (3.13) 
the  isolation  (small  value  of  w)  between  the  regions  separated  by  the  vertical  strip  may 
disappear  after  two  coarsenings.  Nevertheless,  convergence  is  still  rapid.  Inspection  of  the 
last  column  of  Table  4.1  shows  that  introduction  of  Neumann  boundary  conditions  has 
little  influence.  Therefore  it  does  not  seem  worthwhile  to  abandon  (3.8)  along  non- 
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Dirichlet  boundaries. 

With  another  smoothing  method,  namely  point  Gauss-Seidel,  similar  results  were 
obtained. 

5.  DISCUSSION 

Multigrid  methods  that  work  for  elliptic  equations  with  discontinuous  coefficients 
(interface  problems)  have  been  described  by  Alcouffe  et  al.  (1981),  Dendy  (1982), 
Kettler  and  Meijerink  (1981),  Kettler  (1982)  and  in  the  present  work.  The  present 
method  differs  from  the  earlier  ones  in  that  grid  coarsening  is  done  cell-wise  rather  than 
point-wise,  and  prolongation  and  restriction  are  not  dependent  on  the  equations.  As  a 
result,  the  present  method  is  simpler  and  requires  less  storage. 

Comparing  the  rates  of  convergence  that  are  reported  one  gets  the  impression  that 
the  present  method  is  at  least  as  efficient  as  the  earlier  ones. 

Thanks  to  the  simplicity  of  the  present  method,  it  can  be  justified  theoretically.  The 
theory  will  be  given  elsewhere.  Why  does  the  present  method  work?  An  important  factor 
probably  is,  that  (3.12)  is  quite  similar  to  (2.13).  This  suggests  that  the  present  prolonga¬ 
tion  and  restriction  result  in  accurate  coarse  grid  approximation.  Also,  the  similarity  be¬ 
tween  (3.12)  and  (2.13)  simplifies  the  theory. 

Extension  to  3D  seems  easier  than  for  the  older  methods.  The  same  considerations  as 
for  the  older  methods  are  expected  to  apply  to  the  extension  to  systems  of  differential 
equations. 
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